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PREFACE TO VOLUME 3 


This is the final volume of our treatise. It has taken longer to write than we had 
hoped. To some extent this has been due to our involvement with other work, but 
it is also attributable to the amount of development which has been going on in recent 
years in the subjects dealt with in this volume, which are the analysis of variance, 
the design of experiments, sample survey theory, multivariate analysis, and time-series. 
It becomes increasingly difficult to know what is permanent and what is ephemeral 
in the spate of current research. In deciding what to omit and what to admit, there 
have been occasions when we have been reminded of what Thackeray said about 
Macaulay, that he read a book to write a sentence. 

As with the first two volumes, this one is self-contained in three respects: it lists 
its own references, it contains such Appendix Tables as are necessary to follow the 
text, and it has its own index. Now that the Kendall-Doig Bibliography of Statistical 
Literature is available, a comprehensive bibliography is unnecessary. As before, 
extensive sets of exercises are provided at the ends of chapters. 

We have again to thank Mr. E. V. Burke of Charles Griffin and Company Limited 
for the care he has given to the production of this work. 

We are also grateful to many reviewers and correspondents who have commented on 
errors, misprints and obscurities in the first two volumes, and shall be equally glad to 
be notified of any that may be found in this final volume. 

M. G. K. 
A. S. 
LONDON 
August, 1966 


PREFACE TO SECOND EDITION 


This edition is little different from the first. A number of minor improvements 
have been made in the text, a few new exercises added, and many references given to 
new research in the rapidly developing areas covered by this volume. As always, 
readers have been most helpful in bringing misprints and other defects to our attention, 
and we hope that they will continue to do so. 

M. G. K. 


A. S. 
LONDON 


May, 1968 
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GLOSSARY OF ABBREVIATIONS 


"The following abbreviations are sometimes used: 


Asymptotic Relative Efficiency 
Average Sample Number 
Analysis of Variance 

Best Asymptotically Normal 
Best Critical Region 
Balanced Incomplete Blocks 
characteristic function 
cumulant-generating function 
distribution function 

degrees of freedom 

decimal places 

frequency function 

factorial moment-generating function 
generating function 
Likelihood Function 
Likelihood Ratio 

Least Squares 

Minimum Chi-Square 
moment-generating function 
Maximum Likelihood 

Mean Square 

Minimum Variance 
Minimum Variance Bound 


М (a, b) (multi-)normal with mean (-vector) a and variance (dispersion matrix) b 


oc 


Operating Characteristic 

Partially Balanced Incomplete Blocks 
probability proportional to size 
standard error 

Sequential Probability Ratio 

Sum of Squares 

Uniformly Most Powerful 

Uniformly Most Powerful Unbiassed 
Uniform Sampling Fraction 


СНАРТЕК 35 


ANALYSIS OF VARIANCE IN THE LINEAR MODEL: 
CLASSIFIED DATA 


35.1 In developing the MV unbiassed linear estimation properties of the LS 
estimator (19.12, Vol. 2) in the linear model у = X@+e at (19.8), we observed at (19.42) 
that the sum of squares (SS) of the observations may be written identically as the sum 
of two non-negative components 

y'y = (y- X6) (y-X6) (Xy (X6) (35.1) 
of which the first is the sum of squared residuals (Residual SS) from the model fitted 
by LS. The second component on the right of (35.1) is the reduction in the SS due 
to the fitted model; the greater this reduction is (i.e. the smaller the Residual SS is), 
the more satisfactorily the fitted model represents the y-X relationship in the observa- 
tions. If we rewrite (35.1) as 

у'у = (y- ХӨ) (y—X6)+6’(x’ X)6 (35.2) 
and recall from (19.16) that 

V(6) = о°(Х'Х)-: (35.3) 
we see that if the error-vector є in the model is normally distributed, 6, being a linear 
function of e, will by 15.4 be multinormally distributed with mean Ө and dispersion 
matrix (35.3). The last term on the right of (35.2) is therefore the exponent of this 
multinormal distribution apart from the factor —(20%)-1, and by 24.6, 6’ X’ X6/? is dis- 
tributed in the non-central y? form (24.18) with degrees of freedom » = k and non- 
central parameter 
5 = 9 'V-1(0)0 = 0’ X' X0/o?. (35.4) 
For brevity we write this distribution z’?(v, 2) as in 24.5. 


35.2 This result enables us to test the hypothesis that Ө = 6,, and in particular 

to test 
H,:9 = 0, (35.5) 
for then defined by (35.4) iszero, and the distribution becomes a(central) 7? with k degrees 
of freedom (d.fr.). As we saw іп 19.11-12, (y— ХӨ)! (y—X6)/o? is a 7? with (n — k) 
d.fr., and y’y/o® is a y? with n d.fr. when (35.5) holds, Cochran's theorem of 15.16 


applies, and the two components on the right of (35.2) are independently distributed. 
Their ratio (after division by d.fr.) 


Е = {8'Х' X0/k} /{(y— Хб) (y—X9)/(n—k)) (35.6) 


has the variance-ratio F-distribution with (k, n—k)d.fr. when (35.5) holds. 
If we wish to investigate the power of a test of (35.5) based on (35.6), we require its 
distribution when (35.5) does not hold. In order to show that it is a non-central F as 
1 
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at (24.105), we must prove that the numerator and denominator of (35.6) remain inde- 
pendent when Ө # 0. If we wish to test the more general hypothesis that Ө = 6, # 0, 
we require this distribution in order to make a test at all. Thus we need an extension 
of Cochran's theorem (15.16) to non-central normal variables, i.e. normal variables 
with means not all equal. 


35.8 Apart from this particular need, the form of (35.2) is suggestive in another 
way. Suppose that X'X is a diagonal matrix, say C, with diagonal elements с. The 
last term on the right of (35.2) can then be further separated into 


(x6y x8) = È сй. (35.7) 


The elements c;; are positive, since Х'Х is a positive definite (non-singular) matrix. 
(35.7) expresses the reduction in the SS as the sum of Ё parts, one corresponding to 
each parameter. Here again, we may be interested in testing hypotheses concerning 
individual 0;, and require the distribution of the components cj; 62 when Ө + 0. 

If X' X is diagonal, so is (35.3), and the linear model is called orthogonal since the 
6; are uncorrelated, and actually independent when є is normal. We have already 
discussed orthogonal models in the context of regression theory in 28.15-20, Vol. 2, 
where we were concerned with the use of orthogonal polynomials. (28.72) defined 
(and Example 28.3 illustrated) the procedure of evaluating the reduction in the SS 
due to each further parameter, using an entirely intuitive justification. Our present 
discussion will be more general. 


Analysis of variance 

35.4 We now introduce a fundamental concept, originally developed by К. A. 
Fisher in the 1920's. If the SS y’y can be expressed as the sum of non-negative 
components, each of which corresponds to a subset of the parameters of the linear 
model, we call this an analysis of variance (AV) on y. (It would be more appropriate 
to call it an analysis of SS, but history and brevity are against this logical usage.) An 
AV is interpreted as a separating-out of the influences of the different subsets of 
parameters upon the observations y. The importance of such separations in many 
fields of investigation make AV the central technique of much applied statistics. 


Decomposition of non-central quadratic forms 
35.5 We now state a general AV problem. Suppose that y is a vector of p inde- 
pendent normal variates with 


Е(у) = р, У(у)=1, (35.8) 
апа that 


yy= ХуАу, (359) 


where А; has rank rj. Under what conditions will the k quadratic forms Q; = y’ A;y 
be independently distributed, and what will their distributions be? Since, by 24.4, 
the distribution of y’y = Q is a (р, uu), we may expect the components to have the 
same distributional form. 

This differs from the problem considered in 15.16 only in that there we had р = 0. 
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We saw шч that апу one of the three conditions 


(a) b if the rank of О, 


(b) eadh A; is idempotent, i.e. 4; = A}, 
(с) A, A; = 0, all ¿ 4 j, 


implies the other two. Re-examination of the proof of this in 15.17-19 will reveal 
that it did not depend on the value of p. at all. Neither did the proof (in 15.13) of 
Craig's theorem that О, and О, are independent if and only if A, А, = 0, which shows 
that (c) is equivalent to 


(с) each Q; is independent of every other. 
However, the equivalence of (b) and the statement 
(b?) each О; is a (central) 7? variable with r; d.fr., 


depended upon the result of 15.11 that if p. = 0, О; is a y? variable if and only if A; 
is idempotent. It is thus (b^) which requires to be generalized through a generaliza- 
tion of 15.11 to p # 0. 


35.6 The only essential change brought about in 15.11 is that the canonically 
transformed variable y? in (15.43) is now a z’? (1, 3) variable, by 24.4. The c.g.f. of 


i д yj їз therefore not (15.45), but the more general form obtained from Exercise 24.1, 
=з yields for the cumulants of О 
к, = 27-1 (s— 1)! b a(l +s?) (35.10) 
-1 


(the generalization of (15.46)), and also shows that the cumulants of a y’? (v, 2) variable 
are 


= 2:-1(s— 1)! (v 4-52). (35.11) 
For (35.10) and (35.11) to be identical, we must have 
X a=, 
i=1 
Es all s. (35.12) 


Уаш = А 
i=1 


(35.12) is satisfied if and only if every a; = 1, so that » = 7 and 4 = X Hi. Since 
i=1 
the a; are the non-zero latent roots of A, it follows that A is idempotent. We thus 
see that, for general џ, the statement equivalent to (b) above is 
(b^) each О; is a y'*(r;,;) variable, 
reducing to (b°) when р = 0. Moreover, if we transform orthogonally bak from the 
canonical to the original variables, we see at once that 4; = p’A;p, and i A= ши, 


the non-central parameters of the Q; adding to that of О (cf. Exercise 24. 1). 

We have thus reached Һе conclusion that if (35.8-9) hold, any of the conditions 
(a), (b), (c) implies the other two; equivalently, any of the conditions (a), (b’) and (c’) 
implies the other two of them. 
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35.7 The result of 35.6 is unaffected if, in (35.9), y' у is replaced by U = y' Ay, 
where A is any idempotent matrix with rank r<p. The argument justifying this in 
Chapter 15 for the case р = 0 is valid for general p. 

Even if V(y) in (35.8) is generalized so that the components of y have a non-singular 
multinormal distribution with dispersion matrix V which is non-diagonal, the result 
is only slightly changed. For if 

Е(у) = р, Vy) = У, (35.13) 
апа 


k k 
О=уАу= 2 yAy == Q, (35.14) 
we may write V = TT’ and the transformation у = Tz produces independent normal 
variables z, since the exponent of the multinormal distribution is 
y Voy = 2Т'(Т)-:Т-1Т2 = z'z. 
We then have, from (35.14), 
Q =2'TATz = È z'TA,Tz = E On 
and these are the quadratic forms with which we now deal. Condition (b) of 35.5 
is now 
TAT-T'A,T.T'A,T 
or A;V = A,VA,V 
so that A, V must now be idempotent, as must AV. Condition (c) is 
T A,T.TA,T- 0 
or A,VA,—0, i7 j. 
Condition (a) is unaffected by orthogonal transformation. 
We may therefore finally state the general result: 
If y is non-singularly multinormal with moments (35.13), and the decomposition 
(35.14) holds for a quadratic form О where AV is idempotent, then О is a z’? (r, p Au) 


variable, where r is the rank of A, and any one of the three following conditions implies 
the others: 


k 

(а) Ern =r. 
i=1 

(b) Each A, V is idempotent; equivalently, each О; is 7'* (rj ША, p), where rg is 
the rank of A;. 

(c) A; VA; = 0; equivalently, each Q; is independent of every other. 


Graybill and Marsaglia (1957) give some more general results than this. Banerjee 
(1964) simplifies their proofs. 


35.8 These general results on the decomposition of quadratic forms in normal 
variables solve the problems in 35.2-3, which motivated our investigation. For 
example, it now follows that the numerator of (35.6) is independent of the denominator, 
so that their ratio is duly distributed in the non-central F form [which we write, as 
in 24.31, F'(v,, v4, 2)] where v, = k, », = n—k and 2 is given by (35.4) іп this case. 
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Similarly, we now see that for any individual б, in the orthogonal model of 35.3, the 
ratio 

Е = с,0/(у- X6) (y— X6)/(n—h)} (35.15) 
is a F’ (1,n—k, c; 0?/0?) variable, and may then be used to test hypotheses concerning 6;. 

More generally, for a hypothesis H, imposing r < А constraints, the ratio of the SS 
due to Hy and the Residual SS, multiplied by (n— &)/r, is a F’(r,n—k, 4) variable— 
cf. 24.29-31. It should be particularly noted from 24.29 that the non-central para- 
meter À is always of exactly the same form as the numerator SS of the test statistic 
with each observation replaced by its expectation, and о? as a divisor. Thus we may 
always obtain 2 very simply from the numerator SS in the test statistic by substituting 
Ө for 6 and dividing by 0°. 

These are examples of the LR test in the linear model, derived generally through 
the canonical form of the model in 24.25-9. The discussion below (24.100) explicitly 
pointed out that the LR test of a linear hypothesis concerning any subset of r of the k 
parameters is based upon the reduction in the SS due to these r parameters divided 
by the Residual SS. The canonical approach of Chapter 24 had its theoretical uses 
in the derivation of optimum properties of LR tests in 24.36-7. For our present 
purposes, the equivalent partitioning of SS which we have been discussing is more 
direct and informative. 

We remind the reader that exact and approximate expressions for the power function 
of the LR F-tests are given in 24.32-3. 


AV for classified observations 

35.9 Our definition of AV in 35.4 applies to any linear model, and covers the 
applications to regression theory in 28.12-23. However, the term AV is commonly 
used in a narrower sense, in which it was originally developed. 

We saw in 35.4 that AV is used to separate out the influences of different parameters 
upon y. In experimental work, the parameters are often the effects of certain “ treat- 
ments" upon the variable y. For example, in agricultural experimentation, from 
which this terminology derives, y might be the yield of wheat from a plot of fixed size, 
and the “ treatment ” being investigated might be the addition of a certain fertilizer 
to the plot during the growing season. Naturally, the experiment would include both 
treated and untreated plots. ‘The point here is that such an experiment may be brought 
within the scope of the general linear model by defining a “ label ” variable x which is 
equal to 1 when the treatment is given and 0 otherwise. 

It is easy to see that any pattern of treatments can be handled in this way; we need 
only define a label variable x for each possible ingredient of the treatments in the 
experiment. If there are two fertilizers in the example of the previous paragraph, we 
should define x, as the label variable for the first and х; as the label variable for the 
second fertilizer. Thus, a plot which receives both fertilizers has x, = x, = 1; one 
which receives only the first has x, = 1, x, = 0; a plot which receives only the second 
fertilizer has x, = 0, x, = 1; and a plot receiving no fertilizer has x, = x, = 0. The 
analysis of the linear model can now proceed without difficulty, since the elements of X 
may be any known constants. 
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35.10 The feature of the matrix X in the examples discussed in 35.9 is that all 
its elements are units or zeros, since they are merely labels for the presence or absence 
of certain ingredients in the "treatments." In the narrower sense, the term AV 
is used to describe the analysis of a linear model when this restriction holds true for 
all the elements of X. Other small positive integers are also permitted in X in this 
narrower sense of AV. For example, in the single-fertilizer experiment discussed at 
the beginning of 35.9, some plots might be given a single dose, others a double dose, 
and others none at all of the fertilizer. We could then define x = 2, 1 or 0 accordingly; 
the analysis of this model could still be called AV. However, this formulation suffers 
from the fact that it implies that E(y) is affected twice as much by a double as by a 
single dose—the model is linear in £, the “ effect ” parameter expressing the dependence 
of y upon x. This could be overcome by defining two label variables, x, to denote 
presence or absence of a single dose, and x, to denote presence or absence of a double 
dose of the fertilizer. This alternative formulation does not (as the reader may be 
tempted to think) reduce the model to the two-fertilizer model at the end of 35.9, for 
we cannot now have x, = x, = 1 for any plot—there is evidently some loss of symmetry 
to offset the avoidance of the implication of linearity in dose-effect. 


35.11 We shall be discussing the formulation of linear models in several important 
AV situations, and we shall see that the simple (usually 0-1) structure of the elements 
of X produces corresponding simplifications in the analysis itself. The simplest case 
is that of a classification of observations into groups, suspected to differ in their means; 
this is usually known as a one-way classification. 


Example 35.1 AV in a one-way classification 
Suppose that a sample of independent observations is classified into k groups, with 


k 
n; (i = 1, 2, . . . , k) observations in the ith group and X n; = п. If the groups сап 
d-1 


differ only in their means, we may express this as 


Уа = 056p ї=1,2,...,Ё;ф = 1, 2,..., т 
which is in the form of the general linear model 
у = Х0+є, 
where 
Yu 
Ум 
Уш 
Уа b 
Ари apie i 19 Hs 
(nx) exi) z 
Уз, [4 
ж 
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: ті TOWS 
H 
I 
x Nz rows 
1 
1 
X = i пз rows 
(nx) i 
1l: 
: т). TOWS 
1 
The zero elements of X are omitted. We see at once that 
ny 0 
na 
XX= . $ 
0 » т, 
so that the analysis is orthogonal (cf. 35.3) whatever the values of the л; —іп particular, 
they need not be equal. Also Жу 
Li 
Xy- Хум , 
Ху, 
so the LS estimator of Ө is » 
ô -(X'X))X'y- уь , 
Ук. 


ч 
where y; = X уш/1зїһе mean) of the observations in the ith group. The estimator 
q=1 


is in accordance with intuition since the observations are independent. We have 


Yı. } 

i n, rows 
У. 
Уз. 

x6 = H )» TOWS 

Js. : 
ж. | : 

H Ny TOWS 
Yk. 


(®) It should be observed that we are using the dot suffix оп y to denote averaging, and not 
to denote summation as we did for frequencies and probabilities in Chapter 33. 
в 
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and the SS due to the fitted model as a whole is 


k 
S, = (X6) (Xô) = X ni. (35.16) 
i=1 
By subtraction, the Residual SS is, from (35.1), 
kom k 
5а = уу-5= E Èi Èm = EE Ous) (8517) 
To test the simple hypothesis imposing Ё constraints, 
H,:9 = 0, (35.18) 
we use (35.6) and obtain 
_ (n-k\ S, 
F- (55 Sp (35.19) 
distributed as a F’ (k,n—k, ®т б? /о®) variable, reducing to a (central) F(k, n— А) variable 


if H, holds. 

However, (35.18) is not the hypothesis of principal interest in most practical situa- 
tions, where we usually wish to test whether the 0; are all equal without specifying 
their common value. Instead of (35.18), we therefore test the composite hypothesis 

H,:0,—0, = 0,—0, =... = 0,1—0, = 0, (35.20) 
which imposes only (k—1) constraints. If (35.20) holds, the s observations are 
identically distributed with common mean 

á т/п С 
0, = X n,0,/n = | '^/^ | e. 
{=1 : 
п/п 
The LS estimator of 0, is then the overall sample mean 


If 1 is a (пх 1) vector of units, we may rewrite the linear model temporarily in the 
(singular) form 
n,/n\" 
y = X0+e = 10,4 | X—1| ”/" | | ө+е 
т/п 
and observe that the value of б, (a single constraint) is not involved in the hypothesis 
(35.20), and that the SS attributable to 0,, namely 


Q6.) 10.) = пу?, 
must be subtracted from (35.16) (the SS due to the fitted model as a whole) to give the 
SS due to the other (k—1) constraints. This is 


£ 
S, = Sj-ny = E щ(у.-у.)°. (35.21) 
The Residual SS is given by Sp, defined at (35.17), as before. 
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35.8 now gives for the test statistic 
ғ - (2-58. 35.22 
= АЕ—1/8ь (35.22) 
[2 
which is distributed as a F’{k—1,n—k, 2 (б,—б)+/*) variable, reducing to a 


central F(k—1,n—k) when Н, holds. 
For computational purposes, S, and 5р are usually written as 


y (35.23) 


and the results assembled in a table: 


AV table for a one-way classification 


Variation D.fr. SS Mean square (MS) = SS/d. fr. 
Between groups k-1 5, Sa/(k—1) 
Residual n—-k Sn | Sn/(n— №) 
п-1 |  S,Sa-y y—ny? 
General mean 1 | 51—5,=лу% лу, 
ToraL n | 51+5к=у'у | 
(35.24) 


The “ General mean ” row of (35.24) is usually omitted as of no interest; the variance- 
ratio test based on the ratio of n(y.. —6.)? to Sr/(n—k) is, of course, the ordinary 
" Student's ” 22 test for the mean, i.e. it has a F(1,7 — k) distribution when Н, holds. 
The test (35.22) is simply the ratio of the “ Between groups ” MS to the Residual MS, 
while (35.19) is obtained by adding together the “ Between groups " and “ General 
mean " rows of the table and taking the ratio of the resulting MS to the Residual MS. 


AV identities and their geometrical interpretations 


35.12 The general theory of the linear model has been used іп Example 35.1, 
but the final result can be less formally derived as follows. The identity 
kom 


kom x 

E 22 Ou-») = 2 E Our) + E щ(у‹.—у.)* (35.25) 
splits the SS of the observations about their overall mean into а SS “ within groups ” 
and a SS “ between groups ” (i.e. between group means). If it can be verified that 


the two sums on the right of (35.25) are independently distributed in the X"? form, the 
ratio of the second to the first is an intuitively acceptable criterion for testing the 
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equality of the group means in the population. This approach leads to (35.22) as 
before, but it offers no direct justification for the choice of this particular test statistic, 
for which the general theory is necessary. In more complicated situations, the approach 
through algebraic identities like (35.25) is often much simpler and quicker than the 
direct use of linear model theory, but care is necessary in splitting the SS—ultimately, 
safety lies only in checking with the general theory. 


35.13 The Pythagorean form of (35.25) has the virtue of drawing attention to a 
geometrical interpretation of the algebraic partitioning of the SS which is the essence 
of AV. We saw in Example 11.7 that the simpler identity (for a single group of 
observations) 


Zy = Ely) +n (35.26) 
is geometrically equivalent to projecting the point y = (Yy у»... » Yn) in the n-dimen- 
sional sample space upon the equiangular vector, which it meets at (ў, ў,..., Ӯ), and 


using Pythagoras’ theorem in the resulting right-angled triangle. In the more general 
notation which we have been using in Example 35.1 and in (35.25), (35.26) is 

kom 

Ў Ege EZXQw-»)J' t». (35.27) 

i-21q-1 а 
and is therefore seen to be equivalent to the splitting-off of the “ general mean ” row 
from the Total SS in (35.24) to give the left-hand side of (35.25). Тһе further decom- 
position in (35.25) of the first term on the right of (35.27) is similarly geometrically 
interpretable, X E (y;, — y;)? being the squared distance from y to the vector X6 defined 

i 


above (35.16), and En, (y; —.)? being the squared distance from X6 to the equiangular 


vector. From the geometrical standpoint, therefore, AV is seen to consist of a resolu- 
tion of the distance from y to the origin into a number of components relevant to the 
problem in hand. 


35.14 The fact that in Example 35.1 we obtained an orthogonal analysis for any 
classification into groups, no matter what the sizes n; were, encourages us to investigate 
more complex classification systems. We shall find that orthogonality does not gener- 
ally persist for unequal group-sizes, but does so when the sizes are equal We first 
treat the case of a two-way classification in detail, since it exhibits most of the points 
of general interest. 


AV in a two-way cross-classification 
35.15 Suppose that, instead of being simply classified into А groups as in 
Example 35.1, a sample of n observations is classified in a r xc table with frequencies 


My з... | №. 
Mo, Mag + + + Moe | Me, 


(35.28) 
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Although (35.28) is formally identical with (33.60), our present problem is distinguished 
from those of Chapter 33 by the fact that the value of y is here known for each observa- 
tion, whereas there only the frequencies in the cells of the table were known. We 
express this distinction by referring here to a r x c classification as opposed to the term 
categorization used in Chapter 33. We continue the convention of Chapter 33 that a 
dot replacing a suffix to л denotes summation over that suffix, while for the variable y 
we continue the convention set up in Example 35.1 that a dot replacing a suffix denotes 
averaging over that suffix. Together, these two conventions simplify the notation in 
what follows. The reader will see that the grand total frequency in (35.28) should 
strictly be written л, but we continue to write z instead in this one case to denote 
“ sample size." 

We may, of course, treat the rc cells in the body of the table (35.28) as a one-way 
classification (Example 35.1) with А = re. However, the questions which are usually 
asked about the two-way cross-classification (35.28) are: 


(1) Do the means of the row-classification (with frequencies т, ng, . . . , n,) differ? 
(2) Do the means of the column-classification (with frequencies л, my, . . . , л) differ? 
(3) Is there any interrelation between row- and column-means? 

More rarely, we ask also 


(4) Does the mean of the whole set of » observations differ from some hypothetical 
value? 


35.16 Denote the pth observation in the ith row and jth column of the table by 


Yijp We then have, in our notational convention, 
ny 


Jg. = X Ууу, 
р=1 


e ny e 
NM. = E Xygn = Xx "уу. 
jelp-l 3=1 
= E ngYy/ Eng, 
dum eL ci (35.29) 
Уз = E Xyg/n, m Xnyyg/m, 
i=1 p=1 t=1 
r r 
= 2 пуу. / X т, 
1 i=l 
r e wy r e 
у.= E E > ут = У ту /п = E пзуз/т. 
i=1j=1p=1 i=1 j=1 


An easy way of avoiding any possible confusion in notation is to define a dummy 
variable 1,5, which is identically equal to 1 for all p = 1, 2, . . . , mj. Then (35.29) 
becomes уу. = Eyay/ Ens, 

Р 


ж. = E E уу„/® Уп, 
39 7 Р (35.30) 
Jw = E Yip/ Y 2 Nijp » 


Ya = EEE у „/У EE уу, 
Y Y OG 
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which is easily remembered by its numerator-denominator symmetry. In (35.30), and 
hereafter unless otherwise stated, i is always summed from 1 to ғ; j is summed from 
1 to c; and р is summed from 1 to луу. 


35.17 In formulating the linear model, we require one parameter for the mean 
of the observations in each cell of the rxc table. In order to answer the questions 
posed in 35.15, however, we express the mean д in a cell in terms of: 


Hes, à mean common to all observations; 
Hin à mean common to all observations in the ith row; 
и. à mean common to all observations in the jth column. 


Since we already have the cell means ду, common to observations in the ith row 
and the jth column, we now have 1+r+c+rc parameters, of which only rc can be 
linearly independent. ‘The singularity which we have thus introduced by our choice 
of parameters can easily be removed, either by the augmentation technique of 19.14-15, 
Vol. 2, or by eliminating the redundant parameters, as we shall do here. 

Once p., is defined, апу (r—1) of the means yi, determine the other one; similarly, 
only (c—1) of the means 1, need be considered, since they with p.. will determine the 
other one. Once the p; and p. are thus determined, it is easy to see that only 
(r—1)(c—1) of the дуу can be independently determined (cf. the d.fr. in 33.29). We 
may thus confine ourselves to (r—1) parameters и, (omitting д, say), to (c— 1) para- 
meters н, (omitting и, say), and to (r — 1) (c— 1) parameters и; (say, i = 1,2, . . .,r—1 
andj = 1,2,...,c—1). These, with z.. make up the rc parameters required for the 
model to be non-singular. 

It should be noticed that we do not define the parameters pe., His Hej except to 
state that they are (weighted) means of the щу. 


35.18 We now define 


Oos = Hons 
б. = Hie — Hews 

35.31 
6. = Hay es ( ) 


Dig = ag e + Hag) + Hoos 
and write the linear model in the form 
Ур = бә + Ojo + O05 + 9:5 + д, = Mast Eip (35.32) 


For obvious reasons, 0,, is called the general mean, and 0;., 0.; are respectively called 
the ith row-effect and the jth column-effect, measuring the deviation from the general 
mean in a particular row or column. If the deviation of the cell-mean from the general 
mean were exactly equal to the sum of the corresponding row-effect and column-effect, 
we should have 

Hag Hae = (Mie Mae) + 5 Hus), 
which implies 0;; = 0. We then say, in accordance with ordinary usage, that the ith 
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row and jth column “ act additively” or “ do not interact." 6,; as defined in (35.31) 
measures departures from this situation, and is called the interaction between the ith 
row and the jth column. 


35.19 The (r+c+1) linear relations between the parameters, discussed in 35.17, 
may now be written (but we shall return to this subject in 35.26-8 below) 


r B 
0= 7 n0, = X n0, 
ie) CUM T Har 


EU E 
Ж (35.33) 
== туб» ї=1,2, ,r-1, 
jei 
=X È убу. 
i=1 j=1 J 
If, as in 35.17, we define the parameters 6 in (35.32) for i = 1, 2,..., r—1 and 
j71,2,...,c—1 only, the eliminated (r+c+1) parameters may be expressed in 
terms of the others, using (35.33), as 
r-l 
0. = A т.0,./п,. 
e—1 
0.. = – Ў 1,0.,/n., 
j=l 
r-l 
0,5 = — E тубт ]=1,2,...,с—1, (35.34) 
i=l 
e=1 
0. = —ZEnj0j/n, t=1,2,..., 7-1, 
jel 
7—1е-1 
0. = +E E тубу/п. 
i=1j=1 


35.20 We may now write down the matrix X of the linear model (35.32). It is 
not a matrix of units and zeros only, because the expression of the eliminated parameters 
in terms of the others, in (35.34), involves various ratios of the n’s. 

To simplify the reader’s verification of the elements of the matrix in (35.35), its 
columns are headed by the parameters to which they correspond and its rows are bor- 
dered by the frequencies in the cells to which they apply. Only non-zero elements of 
X are shown, Throughout the matrix, a vector of units 1 contains a number of com- 
ponents equal to the sum of the cell-frequencies (in the border of the rows) over which 
the vector 1 physically extends in (35.35). 
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(35.35) 
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Premultiplying (35.35) by its transpose, we find that 
1 (—1) (с—-1) (r—1Xe—1) 


poem 1 399y 59:205 


xx = Se (35.36) 
(e—1) 0 р’ - в 1; 
б-т == eee 
| l 
(r—1Xe—1) 0 071 8. TC 


(the orders of the submatrices being indicated by the numbers bordering the matrix), 
where A, B and C are symmetric matrices with elements above the leading diagonal 
given by 


2 
пт. mn т.п т.т, 
n, +: 1. flo. Lis o. HON 
"Uy f. iy M, 
"uui mm Е flo, N,—1, « 
ny, т, А 
= : : 5 (35.37) 
(r71)x (1) ` 5 
mi nang тап nano 
nyt 1 1^.2 1.3 1.01 
по Ж Ne Ne 
пъ пап none- 
nyt ana а Пе-1 
Ny Ne Ne 
B =з : Е ; (35.38) 


(c—1)x (e—1) 


2 
Te 
fgat 
аа 


Z) 


The (r—1)(c—1)x(r—1)(c—1) matrix C is more complicated. If we label its rows 
and columns by the suffixes of the 0,; to which they refer (so that, for example, the 
3rd row, cth column would be labelled the (13)th row, (21)th column) then the element 
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in the (A/)th row and (mg)th column of C is 


Eo iw 2 
Єр = EDI ith bg 
(Ы), (ng) = Mert Me (++) m q 


TT T 
= папы (+i) ifk=m,144q, (35.39) 
T 1 
= ы 1= 
"(2+2 if k # т, Ф 
= пить" ifk Am, 1+ q. 
The only remaining non-null matrix in (35.36) is D, of order (7—1) x (c— 1), whose 
(i,j)th element is 
ДЕ не P4) qa (The т 
Dy = ny ( Te + x (= ы): (85.40) 


35.21 In general, X' X at (35.36) can only be inverted numerically as in the general 
LS procedure, but inspection reveals that if we can make D — 0, the matrix will be 
of the form 


n 0 
г А 
хх = в (35.41) 
0 c 
whose inverse is simply n-i 0 
А-1 
ŒX) = pa (35.42) 
0 ico 


We are therefore led to examine the conditions under which D = 0, i.e. every element 
Dj; defined by (35.40) is zero, The structure of D; makes it evident that this will 
be so if and only if 


ае ey A лук = a ees es 
ny Ne 

and also 
Му ы re 


5, 1, 
These conditions are simply that every cell-frequency л; be proportional to its column- 
total frequency лу. It follows that every n; must then also be proportional to its 
row-total frequency z;, and that we must have 
пу = njnj/n, all i,j, (35.43) 
for D to be equal to the null matrix. Under this proportional frequencies condition, 
the analysis of the two-way classification becomes relatively simple. 


(*) The proportionality condition (35.43) is, the reader may recognize, precisely the condition 
determining the “ independence " frequencies in a contingency table—cf. 33.4 and 33.29. The 
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The proportional-frequencies case 

35.22 We first observe that the form of (35.42) implies that the LS estimator of 
the general mean б„, is orthogonal to the estimators of all the other parameters, and 
similarly that the (r—1) linearly independent row-effects are estimated orthogonally 
from all other parameters, as are the (c—1) linearly independent column-effects and 
the (r— 1) (c— 1) linearly independent interactions. The only non-orthogonalities occur 
within these last three groups of parameters, and have been imposed by the fact that 
each group has been obtained as a linearly independent subset of the larger (singular) 
group of parameters in which we are interested. 

The reader will, perhaps, have observed that we have not yet evaluated the LS 
estimators themselves. The reason for this is that even when the proportional-fre- 
quencies condition (35.43) holds, the elements of C given at (35.39) are not such as to 
make its inversion simple, although of course we may evaluate C-? numerically in any 
given situation. Fortunately, however, we may use the orthogonalities referred to in 
the preceding paragraph to obtain the LS estimators of the row- and column-effects 
at once, and use them later to evaluate the LS estimators of the interactions. То do 
this, we need only invert A and B at (35.37-8), and use (35.42) to evaluate the first 
1+(r—1)+(c—1) = r--c—1 rows of the (rex 1) LS estimator vector. 


35.23 It is easily verified by matrix multiplication that the inverse of (35.37) is 


Dl p 
m m n m n 
gue m. 
i л А п 
А = : : н (35.44) 
Же? 
Жыл; i 
and similarly that (35.38) has inverse 
Bee a 
Ai Ho OR Te 
dee = 
„л п п 
Bre Е n : (35.45) 
Pu 
Sos 


resemblance is merely formal, since in Chapter 33 the mij were random variables, whereas here 
they are predetermined constants. 
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We now require only the first r+c—1 rows of the (rex 1) vector X'y. From (35.35), 
these are seen to be, in the notation of (35.29), 
ny... 
n (V1..—Ir.) 
п (Ys. — Yr.) 


(Ху), | noa.» |. (35.46) 
п1(у1-7) 
па(Уа.—У„) 


ea teu. У) 
Using (35.42) and (35.44-6) we find, for the first (r+c—1) components of the LS 
estimator (X' X)-! X' y, 


ô.. y.. 

б. Ji. “Ion 

б. |=| у-у. |. (35.47) 
n can 

D z: = k emu 


Thus the LS estimator of the general mean is the overall sample mean, and the LS 
estimators of row- (or column-) effects are the sample differences between row- (or 
column-) means and the overall sample mean. It follows at once from (35.47) and 
the first two linear relationships in (35.34) that the same holds true for the eliminated 
(redundant) row- and column-effects, i.e. that 


б. = у-у. Bro = Yeu (35.48) 


35.24 Substituting (35.47-8) into the definition of the interactions 6;; in (35.31), 
we see that 
бу = у-у. -Уз.+У..» (35.49) 
since 0y is a linear function of the other quantities (cf. 19.6, Vol. 2). Now, clearly, 
from the extreme right of (35.32), we must have the LS estimator 
Ay = us 
and thus (35.49) becomes 
бу = Vis. У. -Уз+У... (35.50) 
Thus all the parameters are estimated, іп this proportional-frequencies case, by the 
“ obvious ” intuitive estimators. 
Now that the LS estimators of all the parameters in our model are known, we may 
proceed, in Example 35.2, to test the various hypotheses corresponding to the questions 
asked in 35.15. 


Example 35.2 Two-way cross-classification with proportional frequencies 
The results of our investigations so far show that the linear model (35.32) for the 
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two-way cross-classification is representable in the non-singular form 
у = ХӨ+є, (35.51) 
where X is defined by (35.35), and @ is the vector whose transpose heads the rows of 
X in (35.35). 
We first consider question (1) in 35.15. This corresponds to asking for a test of 
the hypothesis 
H,:0 = 0, =... = 0,1. = 0, (35.52) 
imposing (r—1) constraints. Question (2) in 35.15 is similarly equivalent to the 
(c—1)-constraint hypothesis 


Hy 20. = бу =... = 0,4 = 0, (35.53) 
and question (3) in 35.15 to the (r—1)(c—1)-constraint hypothesis 
H,0,20, i=1,2,...,r-1;f=1, 2,...,¢-1 (35.54) 
Finally, question (4) in 35.15 corresponds to the single-constraint hypothesis 
Н,:0„ = 0. (35.55) 


All four hypotheses (35.52-5) are composite. To test any one of them, we must find 
the SS attributable to that hypothesis, and use the general theory which we have 
developed, summarized in 35.8. Since the four hypotheses between them account 
for all rc parameters in the model, and have no parameter in common, we see that we 
have to partition the SS due to the fitted model as a whole, namely 6’ X’ ХӨ, into the 
components attributable to the four hypotheses. "This is particularly straightforward 
here, since we have seen in 35.22 that the four groups of parameters are estimated by 
orthogonal sets of estimators. In fact, X'X was given at (35.41). 
We now write the LS estimators from (35.47) and (35.50) in the form 


y... б... 
235.94 
Н ô; 
rcnt 
6-| уу-у. = Я (35.56) 
: 6., 
ЭУ 


u.—JX. mica 
Б 6; 

Dri ex T Mrs 7e FY... 
where the subvectors of 6 have 1, r— 1, c— 1 and (r— 1) (c — 1) components respectively. 
From (35.56) and (35.41), we have the decomposition 

6’ X' X6 = n2, + 6;, A6,. + 6;,BO,, +6;,C6,,, (35.57) 
where A, B, C are the submatrices of (35.41) defined at (35.37-9). The first term on 
the right of (35.57) is the SS attributable to H,, which we write explicitly as 

S, = ny... (85.58) 
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(35.37) may be written in the form 
А = "р, nn», 
where D,, is a (r— 1) х (r— 1) diagonal matrix with elements 7; and n, is the (r-1)x1 


vector with elements we The second term on the right of (35.57) is now seen to be 
7. 


S, = 6;, (D,,+n,n/}6,, 

= 6, D, 0, 1-0; n, (0, n) 
r-l ‘r—1 LA 2 
ELS 0-7.) У ЖЗ) 


i=1 fl, 


E06.) (85.59) 


This is the SS attributable to H,. In an exactly similar way, using B at (35.38), we 
find for the third term on the right of (35.57) 


e 
8, = 2 n4(y4.—».)* (35.60) 


the SS attributable to H.. Finally, we find from (35.39) that the last term on the 
right of (35.57) is 


r—-le—1 1 '"T—1e—1 2 
$,- a 2, ng (94.734734: +Y.) + Е {= 2 ny (Yy. -IiI +») 


кей. fe=t 


2 
+2 >42 А) 


i=1 My (ј=1 
e11 


"—1 2 
+5 4 У -y =y + 
Por» EL 26770 vd} 


foc 


= Ке К OA IE DITIS (35.61) 


upon use of linear relationships among the estimated interactions precisely analogous 
to those for the interaction parameters given in (35.34). The four SS defined in 
(35.58-61) exhaust the SS due to the fitted model (35.51). The only other quantity 
we shall require is the Residual SS, which here, as generally, is the difference 
Sp = y'y—ô'X' XÂ 
= TEE yip- (S1 + Sat 8+ 54). (35.62) 
Р 
For computational purposes, the other SS are written in the forms 
5, = (ХУ 2 Z удь)*\/п, 
8, = E LE yuj)*/nj) (55у) т, 
$ jp LAUR) 
5, = z (2 F Yio)? nj) Ie 2 Ху), (35.63) 
S; = ae (С: Jap)! /mj) -E (c Eyay/ п.) 
E (С Eyup)/ nj) ta 2 Eyagy*/ п, 
which the reader should verify. 
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On substituting from (35.63), (35.62) becomes simply 
Sp = ХЕЕЕ (29). (35.64) 


We may now, as we did in Example 35.1, assemble the results of our analysis in a table: 


AV table for a two-way cross-classification with proportional frequencies 


| 
| 
Variation D.fr. ss MS puse 
Between rows | r-1 Sı Si/(r—1) E mblo" 
Between columns | c-1 5, 5,/(с—1) 5 nj 0. Јох 
Interactions (r-1)Yc—1) S, S3/(r—1)(e—1) У У ту 0у/о% 
ES 
Residual п—тс Sr Sr/(n—re) 
n-1 
General mean 1 5, S, ni. [а 
Тоталі п ET 
(35.65) 


The general theory of 35.8 tells us that the LR test of any of the hypotheses H, to Н, 
is obtained by using the ratio of the corresponding MS in (35.65) to the Residual MS, 
and rejecting the hypothesis for large values of the ratio. Each of these ratios is a 
non-central F variate with d.fr. as given in the table and non-central parameter (obtained 
by using the general rule in 35.8) given in the last column of the table. (As in 
Example 35.1, the test for the general mean is the usual “ Student's " t?-test.) 
If we may assume all Interactions to be zero, S; is merged with Sp as a new pooled 
Residual SS, with (n—r—c+1) d.fr., and the other tests use this MS as denominator. 
To test the comprehensive hypothesis 
H,:9—0 (35.66) 


for all the parameters (which means that Н, Hz, Н; and H, all hold), the same theory 
tells us that the ratio to be used is 


_ (Si+ 5+ 5+ S))/re 
F= EEEa (35.67) 


which is a F’ (rc, п—тс, E E ny (0..+0.+0.;+01)?/0°) variable, the non-central para- 
ij 


meter being obtained from the last term on the right of (35.64), by substituting Ө for 6 
in accordance with the general rule. This test is exactly the one mentioned in 35.15, 
in which the rc cell-frequencies are treated as a one-way classification and the test 
(35.19) applied. Similarly, to test that Н, H, and Н; (but not H;) hold, the numerator 
of (35.67) is replaced by (S,+S.+5,)/(re—1), this test being equivalent to (35.22) 
applied to the rc cell-frequencies. 
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The equal-frequencies (balanced) case 

35.25 The most important case of the proportional-frequencies situation (35.43) 
arises when all cell-frequencies zi; are equal, say to m. The arithmetic of the computing 
formulae (35.63-4) then simplifies obviously (cf. Exercise 35.1). The matrix C of 
(35.39) also now becomes easy to invert and the theory of 35.22-4 correspondingly more 
direct (cf. Exercise 35.2). 

Apart from these simplifications, the only new point arising in this balanced case 
occurs when m = 1, for then (cf. Exercise 35.1) the Residual SS (35.64) is identically 
zero, as are its d.fr., (n—7c). Since all the tests given in Example 35.2 then become 
nugatory, this situation clearly requires special consideration. It is not difficult to see 
how our problem comes about, for with т = 1, п = rc we are put in the position of 
having to estimate rc parameters from the same number of observations. Not sur- 
prisingly, we can do this exactly, with no residual variation—we are in just the same 
position as we should be in fitting a polynomial of degree q— 1 (requiring q constants) 
to a set of д observations. Thus we can estimate all rc parameters even when m = 1, 
but only at the expense of seeing our Residual SS disappear. 

There is no way out of this difficulty unless we consent to reduce the number of 
parameters in the model, and what we shall in fact do is to discard the (r— 1) (c— 1) 
interaction parameters 0;;, leaving ourselves with r+c—1 parameters to be estimated. 
We shall then have a new Residual SS to replace (35.64), and in fact this will be seen 
in Example 35.3 to be precisely the former Interaction SS, 5, defined at (35.61). 

It should not need to be stressed that this restricted model, without interaction 
parameters, is unsuitable for the analysis of data where interactions do exist. For 
this reason it is inadvisable to restrict ourselves voluntarily to one observation per cell 
of a cross-classification unless we are sure that rows and columns do not interact. 
However, considerations of cost or time sometimes enforce such a restriction. 


Example 35.3 Two-way cross-classification with exactly one observation per cell 


If the interaction parameters 0;; are dropped from the linear model, we now have, 
with one observation per cell, 


ув = Ove 0 Og, 


where, to avoid singularities, we define 0j, for i = 1, 2, . . . , r—1 and 0,, for 
j=1,2,...,¢—1, as previously. All the work of 35.17-19 in respect of our present 
parameters holds good. 'ТҺе matrix X defined at (35.35) remains valid if we use only 
its first (r--c—1) columns, as does the leading (r+c—1)x(r+c—1) submatrix of 
X'X at (35.36), in which we now still have D = 0 since the proportional-frequencies 
condition (35.43) holds here. A and B at (35.37-8) and their inverses at (35.44-5) 
are unaffected, as are the vectors (35.46—7), which are now complete instead of partial 
vectors for the LS estimators of our parameters. We may therefore test the hypotheses 
Hi, Н, and Н, at (35.52-3) and (35.55) exactly as in Example 35.2, the only difference 
being that what was previously the Interaction SS, S;, now becomes the new Residual 
SS, for the four SS in the following abbreviated table must add to y'y, as always. 


c 
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AV table for a two-way cross-classification with one observation per cell 


Variation D.fr. SS 
Between rows түз Sı 
Between columns c-1 Ss 
Residual т—1)(с—1) Ss (35.68) 
тс—1 à 
General mean 1 S, 
ToraL n 959 


The tests of H,, H, and H, can now be carried out with the MS S,/(r—1)(c—1) as 
denominator of the F statistic. 

Although we have dropped the interaction parameters б, from the model in order 
to obtain a Residual SS, we can also use their estimators б, to test for zero interactions 
by separating off from that Residual SS an appropriate component. 


Consider the linear form 
L=} Е суд, 
€ 


where 0, is defined by (35.50) (the final suffix to the y's is now redundant) and the c; 
are coefficients to be determined. If the interactions 0;; are all zero, but in general not 
otherwise, E(L) — 0, and it is thus intuitively reasonable to use a statistic of the form L 
to test the hypothesis of zero interactions. If we choose the с; so thata с = 2 су = 0, 


we see from (35.50) that 
L= 28 Cip 


and hence 
var L = C*o?*, 
where C? = X Ej, and о? is the error variance as usual. "Thus L*/(C*c?) is a у? 
i 


variable with 1 d.fr. when the interactions are zero. Moreover, our present Residual 
SS at (35.61) is S, = zr and S,— (L*/(C? о?)} is independent of L*/(C? o°), 
since the буу can be orthogonally transformed to a set of standardized independent 
normal variates of which one is L/Co, and S,— (L?/(C? о?)} will be the sum of squares 
of the others, distributed as y? with (r—1)(c—1)—1 = re—r—c d.fr. 

It remains to choose the с. They can be functions of the б, б.у, since the latter 
are distributed independently of the 0,; by (35.42), and hence the marginal distribution 
of L will be the same as any of its conditional distributions for fixed б, 6.4, which 
will be as given above. 

A simple choice is су = 0; Ô., so that we may define 

Sı = (c 2 д. [n e [3 i 02) 


Ы EIo- 3)047»J9* 


T EQIP 0-0) ` (85.69) 
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S,/o? is a x? variable with 1 d.fr. and (S,—.8;)/c? independently a у? variable with 
(rc—r—c) d.fr. Their ratio S,/(S,;—S;) = Е has the variance-ratio distribution with 
(1, re—r—c) d.fr., and may be used to test the hypothesis that all interactions are 
zero. "This test for complete additivity of effects was suggested by Tukey (1949), who 
generalized it further—see Scheffé (1959). М. N. Ghosh and Sharma (1963) studied 
its power against the alternative that there are interactions of form Ois = abis Oaza For 
the 6 x6 classification, the power was found to be of the same order as the F-test for 
interactions obtained by equating adjacent pairs of the 0,, and of the 0.;. 


Choice of weights 

35.26 We must now discuss a point which we deliberately passed over in formu- 
lating our linear model in 35.17-19. We observed there that we had (r+c+1) para- 
meters in our original model which were redundant in the sense that they were linearly 
dependent upon the rc other parameters, and we therefore eliminated them using the 
set of linear relations given in (35.33), leading to (35.34), which determined the structure 
of the basic matrix (35.35). It is now necessary to recognize that the set of relations 
given in (35.33) is essentially arbitrary—in the first relation given there, for example, 
we chose to equate to zero the particular linear function Zn, Ois using as weights the 


marginal row frequencies z;. "This may seem natural, but it is by no means necessary: 
we might have chosen instead to use equal weights, so that £0, = 0, or indeed any 
Li 


weights w;, so that E mly = 0. 


If the complete set of л observations were а simple random sample from some popula- 
tion, the observed л, /л would be estimates of the population relative frequencies in 
the row categories, and it would therefore be meaningful to define the row-effects 
using these weights to express their linear dependence. Similarly, лабы = 0, апа 


the other relations in (35.33) would be meaningful in the same context. We call these 
the frequency weights. 


3527 However, in many experimental contexts there is no question of the observa- 
tions being a random sample from some population—the ғ хс cross-classification is 
deliberately set up to throw light on the variable (y) being studied. The use of observed 
frequencies as weights in the linear relations (35.33) is then no longer readily interpret- 
able. It may even be meaningless to consider any set of weights as the “ right ” ones, 
in the sense of reflecting an underlying population distribution; for example, if we 
have a 2 x 3 cross-classification to study the effects of two different doses of Fertilizer A 
and three different doses of Fertilizer B on the yield of a crop ( У), one may be simply 
interested in the effects and interactions as such, and not as representing any population 
at all. There is a crucial distinction here between the “ experimental " and the 
"survey " approach to data, to which we shall revert in Chapters 38-9. 

In experimental investigations, therefore, it is common (for lack of any known 
appropriate system of weights) to use equal weights throughout (35.33). For the 
remainder of this chapter, the equal-weights system means that (35.33) holds with all 
symbols л, 7 ;, ту suppressed, i.e. replaced by 1’s. It is to be observed that, whereas 
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in the balanced case (cf. 35.25) the equal-weights system has in effect already been 
used, simply because the frequencies were equal, our general results for the proportional- 
frequencies case (in 35.22-4 and Example 35.2) would not hold unless the frequency 
weights were used. 

Now that we are about to resume discussion of the general disproportional-fre- 
quencies case, which we left in 35.21, the distinction between these two weighting 
systems will become acute, if only because, in this most general case, we may have a 
very small number of very large frequencies which tend to dominate the frequency 
weighting system, and perhaps distort its interpretation. 


35.28 The choice of weights in (35.33) will in general affect the estimation of all 
the parameters, general mean, row- and column-effects, and interactions. However, 
if the true interactions буу are all zero under any weighting system, they will be so for 
all weighting systems, as Scheffé (1959) proves. For under the first weighting system, 
(35.31) shows that 0j) = 0 is equivalent to 


шу = BMW 
Under any other weighting system, the interactions are, from (35.31), 


D (2) uwo 


Ф = ugue — ue ue 
= (ui +H) — Me) — (ui uS] — s 
= (p — ph) + (usp — nS) - (use — Hee). 


OF = а+Ь,+с, 
and it is evident from the definition of interactions in 35.18 that if they were represent- 
able as the sum of row-, column- and general components, these would be absorbed 
into the row-effects, column-effects and the general mean respectively, leaving the inter- 
action equal to zero. We thus have 0 = 0 for all i, j. If Hof (35.54) holds, therefore, 
it holds for every weighting system. 


This is of the form 


Disproportional frequencies 

35.29 We first use the frequency weights (35.33) as before. The proportionality 
condition (35.43) does not hold, so that the matrix D in (35.36) is non-null, and the 
simplified analysis of 35.22-5 is no longer valid. It remains true even in this most 
general case that (35.36) may be partitioned into 


of which (35.41) was the special case D = 0. The'inverse of this can still be written 
down, for 


E (-Dbinys (i тарна о 


5 (6 A = (Cie os (В-р/'А-1р)- 


ANALYSIS OF VARIANCE IN THE LINEAR MODEL 27 


as may be verified by multiplication, so that 


(x'X)i- ( E ) (35.71) 
c- 


The effect of the non-nullity of D on the LS analysis is to change the estimator of the 
parameter-vector Ө, for although the partial vector (35.46) is unchanged the (r+c—1) x 
(r+c—1) leading diagonal submatrix of (35.42) is now replaced by that of (35.71). 
If we write (35.46) concisely as 
ny... 
(vs ) 
Vo—1 


each v being the subvector of (35.46) with number of rows indicated by its suffix, 
we may generalize (35.47), using (35.70-1), to 


n 0\ улу. 
(HX) AR’) 1 = | Xe) 
0 E/Niv.—. 


У. 
(i-r mos mrs ) (35.72) 
(В-р'А-1р)-1(у, , — D' A-1v, _,) 
Thus the estimators 0, 6,; are numerically determinable, while 6,, = y. always, 
as is intuitively obvious. As in 35.24, the definition of the interactions at (35.31) 
then implies that their estimators satisfy 

бу = 94,70. -0.,—y.., (35.73) 
so that the LS estimators of all the parameters are determined. The generalization 
of the decomposition (35.57) is 


0x’ x0 = sn) (S A) (б^)+%ус®,. (35.74) 


Example 35.4 Two-way cross-classification with disproportional frequencies and frequency 
weights 

(35.74) shows at once that H, and Н, of (35.54-5) can each be tested in the manner 
of Example 35.2, although the SS attributable to the interactions must now be numeri- 
cally evaluated from the last term on the right of (35.74). ‘Thus both the general mean 
and the interactions MS сап be tested (with 1 and (r—1)(c—1) d.fr. respectively) 
against the Residual MS, since they are non-central F variables, irrespective of the 
row- and column-effects.(*) 

The SS attributable to the row- and column-effects jointly is the middle term on the 


(е) It should be particularly noted that if equal weights were used instead of frequency weights 
in (35.33), the SS attributable to interactions would not be a separate component in (35.74) 
but would be entangled with that for row- and column-effects just as the latter are entangled 
with each other. However, H; would hold, if true, whichever weighting system were used, in 
virtue of the result of 35.28. 
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right of (35.74), say М. Н, and Н, of (35.52-3) could therefore be tested jointly by 
calculating M. However, practical interest usually lies in testing row- and column- 
effects separately. The SS attributable to row-effects, for example, would be obtained 
by calculating the reduction in the Residual SS brought about by first estimating all 
parameters except 6; and then estimating all parameters including 6;,—this is the SS 
attributable to 6;. Similarly, Ө.; can have its SS evaluated. These two SS will 
not add to M, since row- and column-effects are not orthogonal in general. 

If it can be assumed that all interactions are zero, the situation simplifies (cf. Exercise 
35.4). 


35.30 Use of the equal-weights system instead of (35.33) makes the testing of 
row- and of column-effects computationally a good deal simpler than with the frequency 
weights used in Example 35.4. We may proceed directly as follows. 

Suppose that we first analyse the rx с cross-classification as if it were a one-way 
classification with А = re. We then obtain an AV with n—rc d.fr. for Residual, the 
remaining re d.fr. being attributable to the combined effect of row- and column- 
classifications. Using Example 35.1, we find the AV table below: 


AV for апу т хс cross-classification 


Variation D.fr. 55 
D юс > uu Д 
assification TC ni уй. 
as a whole 3 E (35.75) 
Residual п—тс УУУ (уур-уџ)? 
tip 
‘Toran n LTILlyip 
ijp 


It is clear that any cell for which m; = 1 can contribute nothing to the Residual MS 
(cf. 35.25 and Example 35.3 where every cell has m; = 1). To split the rc d.fr. for the 
classification into its component parts due to row-effects, column-effects, interactions 
and the general mean, we need only analyse the cell means y;;.. 


35.31 From the model (35.32), it follows that the уу, satisfy 
Vag. = Ove +O ju + Dug + Ops Eip (35.76) 
where the errors s; are uncorrelated, with zero means and variances c*/n;;. Suppose 
now that we average the y;;, over columns, i.e. take the unweighted mean 2 £ Yu. = 3o 
j=l 
say. It then follows from (35.76) that 
DEL z 0,4 LE Oy +e (35.77) 


where the e, uncorrelated with zero mean, have variances (02/с2) X nj! = 0° Vi 
i 
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say. If we use equal weights instead of the frequency weights in (35.33), the two 
summations on the right of (35.77) are each equal to zero and we have simply 
Fi = Ont Oi +e. (35.78) 


(35.78) is the one-way classification model (Example 35.1) with a single observation 
in each group, except that the error variances аге not equal. If we define z; = y;/V1^, 
we have E(z;) = (0..--0,.)/V1?, and the conditions of Example 35.1 are otherwise 
satisfied. The effect of this on the analysis is to replace 5, defined at (35.21), which 


in our present application would be X (%-; > 5) " by the same sum with each term 
i Li 


given the coefficient V;!, i.e. 
EZ -1 
8,7 $ (Èe) Gio 
{=з \j=1 


re -1 r/e А 
ў=Ў e e zf > (2 je) 
iil i=1 \j=1 


is the weighted mean of the f; using V;! as weights. 
We therefore have an AV table as follows: 


where 


AV for rxc cross-classification using equal weights 


Variation D.fr. ss 
r 
Due to rows 7-1 i 2 LO 
de Quy re-1)+1 [Obtainable as a difference] 
тс 
Residual п—тс As in (35.75) 
Тоталі п LIX yijp 
ip 


(35.79) 


An exactly analogous breakdown of the “ classification” SS can be made for the 
columns-classification. We therefore have tests of row- and of column-effects in the 
general case. ‘‘ Rows” and “ Columns " SS cannot be added, because of their non- 
orthogonality, so that we cannot obtain the Interactions SS by differencing. However, 
if r or c — 2, a test for interactions is easily derived by this method— cf. Exercise 35.5. 


35.32 The equal-weights system, whose use permitted the development of the 
АУ (35.79), is used naturally in this context, since in effect we reduce the л observations 
to a set of rc means and then analyse these as though they were individual observations. 
There is nothing, of course, to prevent a full analysis using this equal-weights system, 
instead of that in 35.29, and indeed this is the customary procedure. If this is done, 
the results in general are different from those of 35.29. 
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35.33 The method used in 35.31 is due to Yates (1934), who called it the method 
of weighted squares of means. He also discusses other, more approximate, methods 
of analysis, as also do Snedecor (1946) апа К. L. Anderson and Bancroft (1952). АП 
these authors give numerical examples (cf. Exercise 35.7). Scheffé (1959) gives further 
theoretical details of the disproportional-frequencies analysis; in particular he allows 
arbitrary weight-systems in (35.33). 


Empty cells in the two-way cross-classification 

35.34 Throughout our analysis of the two-way cross-classification in 35.15-33, 
we made the implicit assumption that every cell in the table (35.28) contained at least 
one observation, i.e, п; > 0. In practice, it quite frequently occurs that this assumption 
is not fulfilled, as a result of accident, experimental failure, or other causes. We 
must now consider the effects which the presence of empty cells in the classification 
will have on the analysis of the observations. If there is at least one empty cell, the 
cross-classification is called incomplete. We have so far discussed only complete cross- 
classifications. 

We clearly cannot estimate the mean ду, of an empty cell in the general case where 
the corresponding interaction 0;; is non-zero, for we can get no information on бу; 
from other cells in the table. It follows from the definitions (35.31) and the linear 
relations (35.33) that none of the 0;;, Ois, б„, ог 0,, can be estimated in the general case 
if there are one or more empty cells in the cross-classification. However, even in this 
case we can estimate the error variance quite easily. If we denote the number of cells 
containing observations by [rc], we obtain the more general form of (35.75): 


AV for any r xc cross-classification™ 


Variation D.fr. 55 
Due to classification [rc] x 3 ту уй. 
Residual n—[rc] = т E(yup—yj)* (35.80) 
ijp 


Тота, п LIL yip 
tip 


35.35 If the 0; in empty cells are zero, the difficulty in 35.34 disappears. Thus, 
if we wish to test Н, of (35.54) (the hypothesis that all interactions are zero) we may 
proceed, as in Example 35.4, to estimate the remaining parameters, evaluate the SS 
due to them, and thus obtain a Residual SS. The difference between the latter and 
the Residual SS in (35.80) will be attributable to interactions and have [rc]—r—c+1 
d.fr. Similarly, if H, can be postulated, row- or column-effects can be tested as for 
a complete classification by the method given in Exercise 35.4. Scheffé (1959) gives 
further details. 


(ж) In (35.80) summations range over the [rc] occupied cells only. 
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We shall not discuss the matter further here. In 37.50-6 below, we shall be giving 
general methods for the analysis of linear models when there are observations missing. 


Hierarchical classifications 

35.36 The two-way cross-classification which we have treated at length in 35.15-35 
is not the only interesting generalization of the one-way classification in Example 35.1. 
Suppose that, within each of the А groups there considered, there is a further one-way 
classification of the observations. The m, observations in the first group are in /, 


sub-groups, with frequencies ж, 7,5 . . . , л, Where i — nj the second group 
similarly has /, sub-groups, with frequencies л, Man . . . 5 Mary iu = п»; and so 
on until in the kth group there are l, sub-groups with frequencies zi, 7s . . . 5 Mens 
Ў My, = т. It will accord better with our notational conventions if we now replace 
the original group frequencies n; of Example 35.1 by л, to denote summation of the 
sub-group frequencies z;, within the original groups. Thus we have 


и“ 
У пъ = т. 
h-1 


This is a two-way hierarchical classification) of the observations, the separate sub- 
grouping within each of the original groups contrasting with the common row-grouping 
of every column category in a two-way cross-classification. 


Example 35.5 AV in a two-way hierarchical classification 

In Example 35.1 we have already defined k parameters, one for each group. In 
order to investigate variation in the means 0,, of the /; sub-groups within the ith group, 
we use only /;—1 linearly independent parameters, for we may put 


u 
У пб = 0 
LES 


(cf. 35.17-19 for the cross-classification) so аі, as at (35.34), 
1 1-1 


би, = лї nis Dine (35.81) 


k 
We may now generalize the linear model in Example 35.1. We write / = X l, 
i=1 


and уь for the pth observation in the Ath sub-group of the ith group. We have 


(*) The alternative term “ nested classification " appears to be more easily taken to imply 
that there is an equal number of sub-groups in each original group, and we therefore do not 
use it, despite its appealing cosiness. 

(D These Oin are not related to the interaction parameters in the cross-classification. Inter- 
action problems do not arise here. 
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х 
(Me X (41) i (35.82) 
(nx) (пх | *. 


„.х(Ь—1)) 


The (nx k) X submatrix in (35.82) is that used in Example 35.1. Each of the other 
submatrices X is of the form 


1 

= 0 Jua rows 

1 

emy 

SS 0 Mia TOWS 

0 1 

жайы (35.83) 
(me X (0—1) 0 0 1 


у.а] moy En) 
fo P Min 


Do | 
PUN 
} 


ANALYSIS OF VARIANCE IN THE LINEAR MODEL 33 
which follows at once from (35.81). (35.82-3) now give 


т. 0 
п». 0 
0 ak 
хх = 35.84 
(x0 A, Н oes 
A, 
0 = 
А, 
where 
» "ho таз Ta, 1-3 
at) е 6 ———— 
Ni, Min Ti. 
(0—1)x (4—1) 
. 2 
" 4 Pia 
if E p em 
: ny 


is of the same form as (35.37) and therefore has the inverse, of the same form as (35.44), 


ОЕ НР ETES E 
пу т “тъ тщ т‘, 
ПЕШ s 
-12 Nia п’ т E А (35.85) 
Тека d: 
niy- h. 
Hence, from (35.84-5), the LS estimators are 
9. 
Xs. 
Vi 
1 22x02 
6-(XX)'Xy- nuc б (35.86) 
». ee Act Be 
Mate. 
Vie [S «T View 


and thus the SS due to the fitted model is 
k koh 
@Х'хб= > тур 4+ У п. (ум. у.) (35.87) 
i=1 {=1һ=1 
The first term on the right of (35.87) is precisely S, defined at (35.16) in Example 


34 THE ADVANCED THEORY OF STATISTICS 


35.1. What we have now done is to partition off a further SS, the second term on 
the right of (35.87). Since S, was the SS due to the k original group parameters 


k 
6, Oz, . . . » 9 the second SS is that attributable to the 7 (/;,—1) = /— linearly inde- 
i-i 
pendent sub-group parameters now introduced. We may summarize our result in 
the table: 


AV for a two-way hierarchical classification 


Variation D.fr. ss 
Due to groups k =m, yi. 
7 
B ib- 
са pir a l-k z > nin (yin. —yc)* (35.88) 
Residual n-l = т E (уљр— yin.) 
ШУ 
‘Tora, n yy 


The Residual SS in (35.88) is obtained by subtraction. The first row of the table 
may be split into a “ Between groups” and “General mean” components as in 
Example 35.1. 

"The ratio of the “ Between sub-groups within groups ” MS to the Residual MS is, 
from our general theory, a non-central F-variable with d.fr. (/—k, n—7) and non- 
central parameter 4 = УУ тб, which is zero when all sub-group means within 


each group are equal, giving a central F-test for this hypothesis. 


35.37 The hierarchical process can clearly be carried further, with sub-sub-groups 
and even sub-sub-sub-groups. These would be termed three-way and four-way 
hierarchical classifications, and are relatively rare in practice. It should be obvious 
to the reader that there is no need to go through again the rather tedious algebra of 
LS theory to obtain the results we need here; the work of Example 35.5 essentially 
split the SS within each of the k groups into two components, with (/;—1) and (л, — A) 
а.с. respectively, and summed corresponding components over all groups to obtain 
the (J—k) and (n—1) d.fr. in the table (35.88). ‘The same splitting-off process can 
now be carried out within each sub-group, and so on. The reader is asked to verify 
the three-way AV in Exercise 35.3. Scheffé (1959) gives theoretical details for the 
three-way case. 


Multi-way classifications 

35.38 We have just outlined the treatment of multi-way hierarchical classifications, 
and this leads us to a consideration of multi-way classifications in general. 

We first note that, as soon as we consider three-way classifications, there is the 
possibility of “ mixed ” classifications which are partly hierarchical and partly cross- 
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classifications. These arise when a two-way hierarchical classification forms one 
(say, the row-) classification of a r x c cross-classification. In the notation of Example 
35.5, we here have r= l. The AV is carried out in two stages. First, the cross- 
classification is analysed by the methods already discussed, and the Total SS is resolved 
into the usual five components, which we represent concisely by their d.fr. in the 
following table: 


BSidueto " |" “Dae. 


General mean 1 
Row groups k-1 
Rows 1-1 Row sub-groups l-k 
Columns e-1 
i _1)\ Interactions with groups (k—1)(c—1) 
Interactions (0—1Y6— D Finteractions with sub-groups (1—Ю)(с—1) 
Residual n—lc 
"Tora. n 


At the second stage, each of the SS involving the hierarchical (row) classification 
is subdivided into two parts as indicated on the right. The first of these subdivisions 
is a direct application of Example 35.5 (it being remembered that the general mean 
component has here already been removed from the first line of (35.88) by the cross- 
classification analysis), but the simplest way of achieving both subdivisions is to merge 
all sub-groups within the groups of the hierarchical classification and recalculate the 
SS for Rows and Interactions using the merged data—these are the required component 
SS, with (k—1) and (k—1)(c—1) d.fr. respectively. "The sub-groups SS are then 
obtained as differences if the analysis is orthogonal. 

Scheffé (1959) gives theoretical details for the case where there is the same number 
of sub-groups in each group of the hierarchical classification and the same number 
of observations in each cell of the [хс table. 


35.39 Suppose now that, instead of embedding a two-way hierarchical classification 
within a two-way cross-classification as in 35.38, we carry out a new one-way classifica- 
tion within each cell of a cross-classification. If the same one-way classification is 
carried out in each cell, we clearly arrive at a three-way cross-classification. All the 
problems of formulating the linear model, discussed in 35.15-19 for the two-way case, 
now arise afresh, and some generalization of our concepts is required, as we shall now 
see. 


The three-way cross-classification 

35.40 Following the nomenclature already used in the treatment of three-way 
tables of categorized data in 33.58, we now consider a sample of n observations classified 
into a rxcxI table with r rows, c columns and / “ layers,” with frequencies лд, 
wherei—21,2,...,7;] = 1,2,..., c; k= 1,2, ..., I. The fth observation in 
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the (i, j, К) cell is yg, P = 1,2,..., тд- As in 35.15, we follow the notational 
rule that a dot replacing a suffix indicates summation if the suffix is to л, and averaging 
if the suffix is to y, and we replace л. by n. 

We can clearly ask the questions of 35.15 about all three classifications in this more 
general situation, so that we are now interested in the general mean, in row-effects, 
column-effects and layer-effects (the three sets of main effects), and in the interactions 
between any pair of the row-, column- and layer-classifications. However, there is 
a new feature introduced by the additional classification, for we may now wish to 
know whether the interactions between, say, rows and columns themselves depend 
upon the layer-classification. We are here concerned with a higher-order interaction, 
which we must proceed to define. 


35.41 We write the linear model 
Уй» = Pa kp = Osee + Oiee +0. + Goon + Oije + Dien + б, Oijet Eije (35.89) 


the generalization of (35.32). jjj, the mean of the observations in the (i, j, k)th cell, 
is made up of eight components: the general mean 0,,,; the row-effects б, column- 
effects б„„, and layer-effects 6,,,; the row-column interactions Oije, the row-layer 
interactions 0;,,, and the column-layer interactions 0... All these are defined exactly 
as in 35.17-18 for the two-way cross-classification, with an extra asterisk in every 
suffix. The last set of interaction parameters on the right of (35.89), the бу, are defined 
by extending the argument of 35.18. If the deviation of ду from the general mean 
Ouse = [lave Were exactly equal to the sum of the three corresponding main effects and 
the three corresponding interactions already defined, we should have 
Hijk— Have = Өр. + 0..+0..+0.+0,+ 0. 
= (uis Have) + (Haje — Hove) + (aoe Hese) 
+ (Hije — Hios — Hojo + Move) + (inte Hive — Hooke + Have) 
+ (Hee Моје — Hesk Шен), 
or бу, = 0 where (compare the definition of O; in (35.31)) 
бук = Lage (ge + Hiert Нед) + (Hiss + Me ot en) — Hase- (35.90) 
We call the O; as defined by (35.90) the second-order interactions between rows, 
columns and layers. The interactions in a two-way cross-classification are now retro- 
spectively re-defined as first-order interactions. For complete terminological regularity, 
we may also refer to the main effects themselves as zero-order interactions. 
If we write (35.90) in the form 
бук = (Hise (шо Meg) + Hoon} — {Hije — (Hios + Haje) + Hoon} s (35.91) 
and refer to the definition of 0,; at (35.31), we see that 0, is in fact the difference between 
a row-column (first-order) interaction in the Ath layer and the same interaction in all 
layers combined. Because бу is symmetrically defined as between rows, columns 
and layers, (35.91) may be written equivalently as the difference between a row-layer 
interaction in the jth column and in all columns combined, or as the difference between 
a column-layer interaction in the ith row and in all rows combined. Thus the word 
“ interaction ” is as apposite for б here as it was for 0y in 35.18. 
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35.42 Analogously with 35.17, we can only have rel linearly independent para- 
meters in our three-way cross-classification, one for each cell in the table. The model 
(35.89), however, contains 

Irc Ere rb cb rel 
parameters, and the surplus must be dropped to avoid singularity in the model. We 
therefore drop the last of each set of main-effect parameters, and reduce the sets of 
first-order interaction parameters to (r— 1) (c— 1), etc., just as in 35.17. This gives us 
14 (r—1)(c—1)-(0—1)4- (r7 1) (c7 1) - (c— 1) (7-1) + (c7 1) (/-1) 
parameters excluding the second-order interactions. The reader may verify by 
addition that (r— 1) (c—1) (L— 1) of these bring the total number of parameters to the 
required number rci—this is otherwise obvious from the fact that this is the number 
of Hip Which can be independently determined when the other parameters are known. 


35.43 Nothing but the heavy algebra now prevents our following through in 
detail an analysis parallel to that already carried out in the two-way case. "There is 
no numerical difficulty in any given case about fitting the linear model (35.89), estimat- 
ing its rcl parameters and carrying out the AV. Even in the two-way case, however, 
worthwhile simplifications in the algebra only occurred when the frequency in each 
cell was proportional to the product of marginal totals as required by (35.43). Similar 
orthogonality conditions now require that each cell frequency should be proportional 
to the product of all the corresponding marginal frequencies (cf. Seber (1964a)). This 
proportionality condition, in practice, is satisfied with equal frequencies in all the cells, 
ie. in the balanced case. 

The general principles of the analysis are then very simply set out. We saw in 35.39 
that we may regard a three-way (r x c x Г) cross-classification as having been generated 
by imposing a new (/-fold) classification upon every cell of an existing (r xc) cross- 
classification. It follows that the AV can be carried out in two stages, exactly as for 
the “ mixed " classification of 35.38. First, we consider the (r x c) cross-classification 
as a one-way classification with rc cells, and carry out the AV of the rcx two-way 
cross-classification in which our observations are then displayed. We obtain the 
schematic AV table from (35.65): 


SS due to D.fr. 
General mean 1 
TOWS 7—1 
(ғ х с) cross-classification re—1 сорж nur pmi 
interactions (r—1)(c—1) 
Layer classification | 1-1 
row-layer 
i i i —1)¢-1) 
Interactions of (r x c) a e 
oA TORN lumn-layer 
пон свари with (rc-1)0—1 Seacon (с-1)-1) 
ү row-column-layer 
second-order interactions (у — 1)(с— 1)(1— 1) 


Residual | п—тс1 


(35.92) 
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At the second stage, each of the two SS involving the (r x с) cross-classification is 
subdivided into three parts as shown on the right. The simplest method (as in 35.38) 
of doing this is first to merge all columns within rows and recalculate the two SS, 
which then are the required components with (r—1) and (r—1)(/—1) d.fr.; if the 
merging operation is now separately applied to all rows within columns, and the two 
SS again recalculated, we obtain the required components with (c— 1) and (с—1)(1—1) 
d.fr. The two remaining SS (with (r—1)(c—1) and (r—1)(c—1)(/-1) d.fr.) аге 
now obtainable as differences since the analysis is orthogonal. 

The computation of the AV in the general disproportional frequencies case remains 
formidable, even with electronic computers. Pearce (1963) reviews the situation 
generally. Freeman and Jeffers (1962) give a method for the non-orthogonal three-way 
cross-classification; Stevens (1953) used an iterative method for this case. Bradu 
(1965) solves the problem for the simple case where all interactions of all orders are 
assumed zero—see also Rees (1966). 

Gabriel (1963) gives an expository review of the theory for analysing cell means, 
whose variances are inversely proportional to cell sample sizes, with special reference 


to the case when y is a 0-1 variable, and the cell means become proportions. An approxi- 
mate method of analysing cell means is given in Exercise 37.7. 


Example 35.6 Balanced three-way cross-classification 


In the case where all cell-frequencies are equal to т> 1, it is at once obvious that 
we may treat the (r x cx I) cross-classification as a (r x c) x or a (rx Dxcora(exl)xr 
at the first stage of calculating the AV in (35.92). It will be seen from this symmetry 
that each of the three sets of main effects and each of the three sets of first-order inter- 
actions will have its SS calculated exactly as in a two-way cross-classification table with 
the third factor of classification merged. 'ТҺе Residual SS is clearly also unchanged in 
form. In our present three-way notation, we therefore obtain the following expressions 
from (35.63-4) and Exercise 35.1 for the SS corresponding to the components in 
(35.92). The suffix to S now indicates its d.fr. 


General mean: 
S, = (È E E E Yijrp)?/(rclm), 
ОЕ 2 


Row-effects: 
Se = (BEE уњ) elm) Sy 


Column-effects : 
Se- = Z(ZXZ 2 /(rlm) — S,, 
inm Oir ci (85.93) 
Layer-effects: 
502 = zu 2 E yas) /(rem) - Sy 
Row-column interactions: 


8-1-0 = 2e = E yii)! / (m) = 584.357 Se- Sy 
Row-layer interactions: 
51-1) = = x & E уь) (ст) — Sv 80-97 S» 


Р 
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Column-layer interactions: 
Seago БЕ БЕ (Ин) = Ss 5 Spay бр 
ее = X Еб Хул) (rm) — St 7 0-5, — Sy (35.93) cont. 
Residual: 


Sian-) = ZEZZZyg 2XEZ(Xygp)/m. 
5 Ер ijk p 


Since the eight components in (35.93) together with the SS for the second-order inter- 
action must add to E E E E ур, as always, we obtain by subtraction the SS: 
? 


jk 
Second-order interaction: 
St-36-)0-) = X 2 z (È Yijrp)?/M— 5у—луе-1у— Ser—1ya—1) 
$ р 
= 8-0-0) 8-1) Se- Sa- — S. (35.94) 


We finally assemble the results of (35.924) into the AV table below. 


SS defined by 


Variation due to D.fr. | (35.93-4) 
Row-effects 7—1 | 5-1) 
Column-effects | c-1 | Sp-1) 
Layer-effects 1-1 | Sa-1) 
Row-column interactions (r—1)(e—1) | Sor—1y(e-1) 
Row-layer interactions | е—1)01—1) | Se-»a-5 

| 


(c-1)0—1) Sce-1-) 
(r-1c—1)0—1) | Sce-nce-na-» 


Column-layer interactions 
Row-column-layer interactions 


| rd-1 | 
General mean E | 5 
Classification | rel 
Residual rcl(m—1) | Srim-1) 
— = —— E 


Tora relm=n 2: 


(35.95) 


Any of the eight rows of (35.95) forming part of the “ Classification ? SS may be tested 
against the Residual SS, just as previously, by the ratio of its SS/d.fr. to the Residual 
SS/d.fr. Each ratio has а non-central F distribution, becoming central if the hypo- 
thesis tested holds. 


Multi-way cross-classifications 

35.44 The reader should now be able to see how the three-way cross-classification 
analysis can be further generalized to four- and more-way classifications by repeated 
application of the argument we used to obtain the three- from the two-way analysis. 
‘The formal symmetry of (35.95) in the balanced case, and also of (35.93-4), invite 
more direct generalization to higher-order classifications. We should notice particularly 


D 
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the uniform correspondence of the d.fr. attached to an SS and the number of linearly 
independent parameters which it represents—this is easily seen as a consequence 
of geometrical arguments of the type referred to in 35.13 above, which are discussed 
in detail by Scheffé (1959). 

Any such further generalization involves the definition of third- and higher-order 
interactions if these are required in the model, but these interactions tend to be so 
remote and difficult to interpret that they are frequently ignored in the subsequent 
analysis, their SS and d.fr. being merged with the Residual. 


The combination of AV tests 

35.45 Suppose first quite generally that А distinct hypotheses were to be tested, 
and that their respective test statistics were all independently distributed. To obtain 
a combined test of the k hypotheses, we could use the result of Exercise 16.4 in the 
manner of Exercise 30.9, Vol. 2. Applying the probability-integral transformation 
to each test statistic, and directing it so that critical values of each test statistic correspond 


k 
to small values of its transform P,, we then have —2 X log Р, = P as а у? variable 
i=1 


i= 
with 2k d.fr., large values of P being critical. Whatever the sizes of the constituent 
tests, we use a size-x test on Р. If the tests are not independent, however, we encounter 
exactly the difficulties mentioned in the context of tests of fit in 30.36, and this combined 
test is not useful since its distribution requires knowledge of the joint distribution 
of the test statistics. 

Another simple general approach to the combination of independent tests arises 
from the observation that if the ith test has size «;, the probability of rejecting at least 


k 
one of the hypotheses tested when all are true is simply 1— II (1—a,), which reduces 
i=l 


when all о; = « to 
Р, (х) = 1-(1—2)*, (35.96) 


approximately equal to Ах when а is small, as it normally is in practice. 


35.46 Now if all the А tests in an AV table were independent, we could use (35.96) 
to fix the overall size in testing the set of variance-ratios as a whole, so that if there 
are four tests to be made at size «, and we required overall size to be 0-05, we should 
solve 

0-05 = 1—(1—«)* 
for х or, approximately, put ж = 0-05/4. However, the tests in the AV tables which 
we have considered are never independent tests, for although the various SS in a table 
may be independent of each other, all the tests we have derived use the Residual ss 
as denominator of the test statistic, and the various tests must therefore be statistically 
dependent, since, e.g., a Residual SS which is (by chance) large will depress the values 
of all the test statistics simultaneously. 


35.47 Fortunately, however, (35.96) still holds as an approximation, as Hartley 
(1955) showed. Suppose that (k+1) independent mean squares sê, sj, ..., si are 
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observed, with respective d.fr. v, 7,,...,%- We write G, G,,..., С, for their distri- 
bution functions and g = G’ for the f.f. of s*. Let the А values Р; be defined as the 
solutions for a fixed « of 
Prob {s?/s*< Fj} = 1—«, 
so that F; is the 100(1—«,) per cent quantile of the distribution of the ratio sj/s?. 
The probability that none of the ratios s?/s* exceeds its F; is 


2 2 ork 
Р) = Prob [5 «r, ... Aen) = | (à сик) es) (35.97) 
o lii 
We have denoted (35.97) by P(1) because it is the value at 0 = 1 of the function 
ok 
PO) = (^fi (12) +0{G,(F) - (1 1094s, (35.98) 


and we see at once that P(0) = (1—«)*. In order to expand P(1) in a Taylor series 
about zero, we investigate its derivatives. We find 


PO = È [7 (Ger)-0-2) I (206,67) -Q —2)10924s 


jai 
so that 


Р'(0) = (1-a)*-1 z р {G;(xF;)—(1—a)} g(x) dx = 0 
since 
[re (xFjg(x) dx = 1-а. (35.99) 
0 


Thus the ‘Taylor expansion is 
P(1)-(1-2)* = 4P"(6), 0<6<1, 


= 13k |” (G,65)- 0-9) (6/69)- (0.9) 
рр 


k 
x i [(1 — 2) + 6 (G, (xF;) — (1 — 2) ]e(x) dx. 
-1 
1#,) 
Every term in square brackets lies between 0 and 1, since х, б and С do so, and if we 
assume all these terms equal to 1 we therefore obtain the inequality 


Dor m Е {Gi (xF)) - (1.—2)) {С,(хЁ,)— (1 — a)}g(x) ds, 
from which the Cauchy-Schwarz inequality gives the blunter inequality 
|P()-0-2*] 
© + 
<4 2 [^ Guar) ~~ a) ede [^ (оғ) вода 
isj 0 о 
and, even further, 
|P()- (1-2 | 3806 1) max [7 (6,8) - (0-2) да), 
and because of (35.99) we may write this 
| P(1)- (1—2)* | < 3&(k— 1) max [var {С, (s* F;))]. (35.100) 
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35.48 Hartley (1955) has computed the upper bound given by (35.100) for a 
range of values of k, v and v; when (1—«) is near 0-05. It decreases to zero as v in- 
creases to infinity, but increases with k and v; For the rather unfavourable case 
k = 10, v = 30, max v; = 10, the upper bound is 0:0138; if v is then increased to 60, the 


bound drops to 0-0050. For k = 5, v = 60, max v; = 5, the bound is 0:0014. Since 
i 


the actual difference will generally be appreciably less than the upper bound (three 
separate “ throwing-away ” processes produced the bound), we may conclude that the 
approximation of (35.97) by (1—«)* is quite satisfactory over a wide range, especially 
when > (the d.fr. for the Residual SS) is large. 
We therefore see that the set of Ё variance-ratios in an AV table is tested with size 

Р, (ә) = 1—P(1), related approximately by (35.96) to the nominal size х at which 
each variance-ratio in the table is tested. Alternatively, if we rewrite (35.96) in the 
form 

1-a = {1—Р,(а)}!*® 
and substitute « for Р, (х) and f(k) for x, we have 

p(k) = 1-(1—a)"/*, (35.101) 
so that a test of size « in the set of А variance-ratios isachieved if each of them is separately 
tested with size В(А) defined from о by (35.101). 


Earlier work on this problem was done by Hartley (1938) and Finney (1941). Cochran 
(1941) and Darling (1952) considered the ratio of the largest mean square to the sum 
of all mean squares when all v; are equal. 


35.49 The result of 35.47-8 states that separate tests of size (35.101) on the k 
variance-ratios with common Residual SS give an overall test of approximate size ж. 
However, the fact that the denominator SS is common to all Ё tests suggests that it 
may be inefficient to make the tests separately in this way, since the result of any one 
of the tests gives relevant information for the others. A step-by-step procedure which 
utilizes this fact was suggested by Hartley (1955). 

Define H; as the probability-integral transformation of s?/s? to the uniform distri- 
bution on (0, 1) (cf. 1.27), and order the H; so that Hi; is the ith smallest of them. 
Since Prob {H;>1—6} = 8, a test of size д on H; is obtained by comparing it with 
1-6. 

The first step is to test Hy), the largest of the H,, with size 1—(1—42)'/* = A(k) 
as at (35.101). If Hy) « 1 — (К), the hypothesis of homogeneity of all the observations 
is accepted outright; if Hy)>1—A(k), we reject the corresponding hypothesis of “ no 
effect" in the AV table, and proceed to test Hj, , at size f(k—1). If Hy у< 
1— f(k— 1), we accept all the (k— 1) remaining hypotheses of “© no effect ” in the table; 
if Hg 3) f(k— 1), we reject the corresponding hypothesis and proceed to test Hi 
at size B(k—2), and so on until some Hy « 1— (7) (when the і remaining hypotheses 
are accepted) or Hy)>1—A(1), by which time all А “ no effect " hypotheses will have 
been rejected. 


35.50 This step-by-step test is easily shown to have size ж. Suppose that of the 
k variance-ratios, c correspond to “ zero effects ” and (k— с) to non-zero effects, and that 
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of the c transformed values H; corresponding to the former among the k H, Hy is 
the largest. If any of the “ zero-effects ” hypotheses are rejected by the test procedure, 
that corresponding to H) must be, since it is reached first in the step-by-step process. 
The probability that any “ zero-effect " hypothesis is rejected is therefore 
Р = Prob {Hy >1-A(i), i= 1,1+1..., 8) 
< Prob {H > 1—A()}. 

Since />c, and A(z) is a decreasing function of i, we have further 

P< Prob {H > {1—(с)} = ә, (35.102) 
since if Hp, the largest of a set of c variance-ratios, is tested with size (с) we obtain 
an overall test of size « by (35.101). (35.102) shows that the step-by-step test never 
has size exceeding ж. If с = k, l= k also and (35.102) becomes an equality. 


Hartley (1955) gives some consideration to the power of the test when all mean 
squares have equal d.fr. 


Multiple comparisons 

35,51 Each of the variance-ratio tests in ап AV table tests a hypothesis concerning 
a set of parameters, e.g. the row-effects or the interactions between rows and columns. 
For practical purposes, however, it is often not enough to know, for example, that the 
row-effects 0;, are different—we need to know which of the 0, are to be regarded as 
greater than the others, or more generally whether the 0;, may be said to fall into 
distinct groups. 

Now the LS estimators бү, are, of course, the MV unbiassed linear estimators of 
their corresponding parameters, and provide us with estimators of any of the differences 
0;, —0,,, but we are usually unable to nominate the differences of interest in advance, 
and we therefore are faced with the problem of carrying out a number of non-independent 
tests on the differences. The discussion of 35.45-6 applies here with obvious changes, 
although the problem we are now concerned with is a more detailed one. Whereas 
in 35.47-50 we were dealing with combining tests on sets of parameters, we are now 
interested in closer examination of a particular set, say the row-effects. 

In a sense, this problem of multiple comparisons, as it is called, is a more complex 
version of the problem of outlying observations, discussed in 32.23-8. Instead of 
being concerned about a location-shift in one or more observations, we are now more 
generally asking whether the observations (here the 6,,) have expected values (the 
Oi.) which fall into distinct groups. Tukey (1953) reviews the subject. 


The LSD test 

35.52 For the sake of definiteness, our discussion will refer to a one-way classi- 
fication as in Example 35.1, although there is no essential difference if we consider 
any set of effects in an AV table. In Example 35.1, the observed group means у; 
are the LS estimators of the А group parameters 0; (each of which includes the general 
mean as a common element). If the F-test at (35.22) rejects the hypothesis (35.20) 
that the 0; are all equal, we are faced with the need to decide which subsets of the 0; 
may be regarded as homogeneous, and which not. 

The simplest test procedure is the oldest (“ Student,” 1908), namely to carry out 
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an ordinary two-sample “ Student's" t-test on every one of the 3A(k— 1) possible 
pairs of уг, уу, ij. If each of these tests has size о, we can say little about the size 
of the overall combined test, since they are non-independent—no simple combination 
formula like (35.96) is available. 

This combined test amounts to calculating an estimated standard error of the 
difference between two means (using the Residual SS) and comparing the observed 
difference with it, using the appropriate (Residual SS) number of d.fr. in the “ Student's ” 
distribution. If the number of observations in each group (n;) is the same (equal to 
N, say) only a single standard error estimate is needed, of form (2s?/N)}, where s* 
is the unbiassed estimator of the error variance о? We thus set up a Least Significant 
Difference (the appropriate multiple of the standard error for а two-sided test of size a) 
and compare each of the 4k(k—1) observed differences with it. In consequence, 
this is sometimes called the LSD test. One cannot say much more in general about 
the LSD test than that if the true group means do not differ, a proportion « of all 
pairs adjudged heterogeneous by the test will be wrongly so judged. 

A simple modification of the LSD test, proposed by Fisher (1935), is to reduce 


the size of each component test from « to « / (2) This has the effect of reducing 


the expected number of pairs erroneously adjudged heterogeneous (when all group 
means are truly equal) from 3A(k— 1)« to simply «. 

When using either the original or modified LSD test, it must be remembered that 
the expected error rates just referred to are unconditional ones, taking no account 
of the fact that the test is made after (and because) the overall F-test rejected the 
hypothesis of homogeneity. 


Step-by-step and simultaneous test procedures 

35.53 Like the outlier problem of 32.23-5, which it resembles, the multiple 
comparisons problem was often discussed in terms of sample range criteria. 

The simplest of these is Tukey’s (1951, 1952) studentized range test. The k 
group means, which we shall now write $, i= 1, 2, ... , h, are (on the hypothesis 
of homogeneity) a random sample of size Ё from a normal distribution with variance 
c*/N, independently estimated by s*/N, where N is the number of observations in 
each group as before. Suppose the group means to be ordered, and denote them by 
Ray ву... Sw- <. To any pair 3o) and o), i< j, there corresponds a difference 
(Ki) — o) which is the range of a subset of (j—i--1) adjacent ordered group means. 
This subset is adjudged heterogeneous (and the extreme group means ў and Ху 
therefore regarded as from different populations) if (ï) —3(p)/(s/N?) exceeds the 
100(1 – х) per cent point, ф„, of the studentized range (cf. 32.25) of k observations. Since 
no subset range can do this unless the range (Ху — ñw) does, the procedure clearly 
has overall size a. 

In practice, #0) is successively compared with %), Ža), and so on until a “ homo- 
geneous” verdict is reached. If (%j)—%) is homogeneous, the test ends there; 
otherwise all means ža), ža» ... found to be heterogencous from 3, are tested against 
3. again in succession starting from a). All means then found to be heterogeneous 
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from œ- are tested against Xy 9, and so on until no further “ heterogeneous ” 
verdict can be reached. Such procedures, in which the decision on each subset depends 
on previous decisions concerning a larger subset, are called step-by-step or stepwise 
procedures. 


35.54 А different simultaneous test procedure employing a sum-of-squares, rather 
than a range, technique is suggested by Gabriel (1964). The “ between-groups ” 
SS (35.21) is calculated for every one of the (2*— k — 1) subsets of two or more of the 
k groups, and tested against the fixed critical value obtained from the variance-ratio 
distribution with (k—1, n—k) d.fr. The sample sizes n; need not now be equal. 
This procedure leads to transitive judgements in the sense that no subset can be adjudged 
heterogeneous when a larger subset containing it is not; i.e. no subset can be adjudged 
homogeneous unless all smaller subsets which it contains are also homogeneous (cf. 
Exercise 35.16). Clearly, this procedure contains the ordinary variance-ratio test as 
a component, when the subset is actually the whole set of k groups. This implies 
that the test has overall size «. 

When all the n; are equal, Tukey’s test in 35.53 can be modified to be a “ simul- 
taneous test procedure " of the same type as Gabriel’s if every subset of, instead of 
merely every subset of adjacent, group means is tested by the range criterion. (This 
test has the additional property that if any set of more than two groups is adjudged 
heterogeneous, at least one subset of it would be.) 

Both the Gabriel and the Tukey-Gabriel methods discussed in this section have 
the property that the probability of erroneously judging a subset to be heterogeneous 
decreases with the size of the subset. For k — 8 and 40 d.fr. for the Residual SS, 
this phenomenon is more marked for the former method— Gabriel (1964) gives tables. 
The Tukey—Gabriel method is much simpler computationally, especially for large k, 
but is only available when all »; are equal. 


35.55 Instead of using the fixed critical value of the k-observations studentized 
range, as in 35.53, we can test (х —3(5/(s/N?) against the studentized range of 
(j—i--1) observations, as suggested earlier by Newman (1939) and Keuls (1952). 
A new point now arises, for a set of g adjacent (ordered) group means may be declared 
“ homogeneous ” while a subset of p adjacent group means, contained within that set 
of q, is “ heterogeneous ” by this criterion. The Newman-Keuls step-by-step pro- 
cedure adjudges a pair of group means heterogeneous only if every subset of adjacent 
group means containing that pair is heterogeneous by the studentized range test just 
defined, which takes account of the number in the subset. 

'The computational procedure is just as in the last paragraph of 35.53, except that 
the critical value in the studentized range test now varies in the component tests, 
instead of being fixed as previously. Once again, the overall size is at once seen to be 
о, since (xu) — х()) must first be adjudged heterogeneous if any other difference is to be. 

D. B. Duncan (1952, 1955, 1957) proposes what is essentially a modification of the 
Newman-Keuls procedure in which each difference (ху — хд) is tested against the 
100(1—«;_;41) per cent point of the studentized range of (j—i+ 1) observations, where 
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42;-,44 now depends on (j—i) through the relation 
аа = 1—(1—„)!—% (35.103) 
the argument stemming from (35.96) and the consideration that a test of (j—i--1) 
observations is equivalent to (j—7) separate tests on pairs of observations. 
Thus the probabilities of error are redistributed among the components of the test 
procedure, falling as the group means compared are closer together in the ordering 
(cf. 35.54). D. B. Duncan (1955) provides tables for overall test size ж = 0-05 and 0-01. 


35.56 D. B. Duncan (1955), Tukey (1953) and Scheffé (1959) discuss some other 
multiple comparison procedures, and Harter (1957) makes some power comparisons, 
while Hartley (1955) briefly considers the power of the Newman-Keuls method. 
Gabriel (1964) compares the Newman-Keuls and Duncan step-by-step methods with 
the two simultaneous test procedures discussed in 35.54, and proves that for given 
overall test size and any step-by-step method, the probability of erroneously judging 
any subset heterogeneous cannot be less than for the simultaneous test procedure 
based on the same statistic. 


McDonald and Thompson (1967) develop rank sum multiple comparisons methods 
for one- and two-way classifications. 


Simultaneous confidence intervals for differences and contrasts 

35.57 The studentized range test of 35.53 leads immediately to simultaneous 
confidence intervals for all J&(k— 1) differences between true group means (0,—6;), 
proposed by Tukey (1951). For whatever the 0; may be, the random variables 3;—6; 
are identically and independently normally distributed with zero means and variances 
c?/N, and the probability is 1—« that their studentized range will not exceed g, defined 
in 35.53. It follows that simultaneously for all i # j, 


(%—60,)— (®,—6,) E. 
Prob{ GI EC | <q) = 1-a (35.104) 
so that (%—5)-45/%<0,—6,< ($; 5) - q, s/N? (35.105) 


is simultaneously satisfied for all i # j with probability 1—«. 
Exercise 35.11 shows that the method extends to negatively equi-correlated multi- 
normal &;. 


35.58 The method of 35.57 enables us to make simultaneous statements about 
all 3&(k— 1) differences 0;—0; with a known overall confidence coefficient 1—2. In 
many applications of AV, we are interested not only in the differences but also in 
other linear combinations of the 0; with constant coefficients summing to zero. Such 


a linear combination is called a contrast, defined by 


k k 
y= 260, Х с = 0. (35.106) 
i=1 i=1 


The most obviously useful contrast other than the difference between any б; and б, 
is the difference between the average of any subset of р of the k parameters 0; and the 


average of the k—p others. Interactions (defined in 35.18, 35.41) are also at once seen 
to be contrasts. 
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The method of 35.57 is easily adapted so that every contrast, and not merely every 
difference, of the 0; is simultaneously covered by an interval. Since the number of 
contrasts is infinite, the resulting gain in generality is considerable. 


k 
35.59 Write z; = 5—60, and let У с; = 0. Consider the maximum possible 
i=l 
value of Xc;z, Since X с; = 0, the sum of the positive c; is $ E |c;| as is also the 
i 


sum of the negative с. We therefore see that 
Zeus E | e; |) max | 2,72; |, 


ie. Xe(8-5)« ||) max | (8-6) (5—0) | (35.107) 
Referring back to (35.104), we see that (35.107) implies that for any choice of the с; 
with E c; = 0, 
тө 
and hence that 
Beh — (HE с, 87 40, сима Х| cy ыу! (35.108) 
i i i i i 


is simultaneously satisfied for all contrasts y = E c;0;, with probability 1—«. The 
method again generalizes to negatively equi-correlated multinormal à; (cf. Exercise 
35.11). 


> ч(%—6) 


s/N* 


«азтан = 1-% 


35.60 In 35.59, simultaneous confidence intervals for all contrasts were obtained 
from intervals for all differences by the use of a rather wasteful inequality. It is not 
surprising, therefore, that in general these are not the most useful intervals for all 
contrasts. To obtain a more useful set, we make an entirely different approach. 

The estimator of any contrast (35.106) is 

$- У сй. (35.109) 


Clearly, 
E(9) = v, 
and, further, if the 0; are normally distributed, so will ý be.. If we now consider any 
set of r (<А) estimated contrasts, which we write in the form ф = Сб, it will be multi- 
normally distributed (cf. 15.4, Vol. 1) with mean vector equal to «p — СӨ and dis- 
persion matrix 
V = Ү(ф) = CW()C'. (35.110) 
In our present discussion of the one-way classification with equal frequencies N, 
V(6) is diagonal, with elements o*/N, so that 
2 
у= СС. (35.111) 
We assume that V is non-singular. 
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35.61 The result of 15.10 now implies that the quadratic form 
Q = ($-w) V (9-w) 
has a у? distribution with degrees of freedom equal to r, the rank of V. Independently 
of О, the Residual SS (divided by о?) is also distributed in this form with, say, v d.fr. 


Thus the ratio 

Е = (Q/r)/(*/o*), 
where s? = (Residual 55) /» as usual, has the variance-ratio distribution with (r, >) d.fr. 
In the simplest case, (35.111), this gives the statistic 


Е = ($—w) (CC) ($—w)/(s*/N). 
If we call the 100(1—«) per cent point of this F-distribution F, s,» , we now have 
Prob {( — )' (CC) (p —w)/(s?/N) &rF,,,) = 1—a. (35.112) 


'The corresponding general result is at once available by using (35.110) instead of 
(35.111). 


35.62 Since V must be non-singular, its rank r cannot exceed g, the number 
of linearly independent comparisons possible among the 0; (k—1 in the one-way 
classification—see Example 35.1), which is equal to the d.fr. for their SS in the AV 
table. (35.112) with r = q holds for any set of д linearly independent contrasts we 
may choose, but this does not imply that it holds for every such set simultaneously. 
However, Scheffé (1953, 1959) showed by geometrical methods that it does imply 
for every single contrast y simultaneously that 

Prob ((9—y)* «qF,, a, PO} = 1—«. (35.113) 
Here, Ё(ф) is the estimated variance of ў, in which о? is estimated by s? with > d.fr. 
Cf. Exercise 35.12 for an analytic proof and Exercise 35.19 for an extremely simple 
algebraic one. 

Scheffé (1953, 1959) went on to show numerically that the intervals for all contrasts 
yielded by (35.113) are generally shorter than those obtained from (35.108) unless 
the contrasts happen to be differences—for which (35.108) reduces to (35.105), designed 
specifically for differences—or otherwise have very few non-zero с. Moreover, 
Scheffé’s method is not restricted by the need to have (%,—0;) distributed with equal 
variances, an assumption fundamental to the argument of 35.57. 


35.63 If we now reconsider the variance-ratio test of the overall hypothesis that 
all the 0; are equal ((35.22) in the one-way classification case), we see that this hypo- 
thesis (cf. (35.20)) states that д linearly independent contrasts are all zero. This 
implies that all contrasts are zero, for every contrast may be regarded as a linear com- 
bination of the q linearly independent ones. Thus the overall test is logically equivalent 
to testing the hypothesis that each of the infinite number of possible contrasts is zero, 
ie. seeing whether at least one of the infinite number of intervals given by (35.113) 
does not cover the value zero. (See also Exercises 35.12 and 35.19.) This property 
extends at once to Gabriel’s (1964) simultaneous test procedure in 35.54: a subset 
will be adjudged heterogeneous by that method if and only if some contrast within 
the subset has interval (35.113) not covering zero. 
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This is the main use of Scheffé’s all-contrasts method: once the overall test has 
rejected the homogeneity hypothesis, the all-contrasts method may be used to examine 
any contrasts to reveal whether they are in fact the reasons for rejecting the hypothesis, 
and to calculate confidence intervals for them—they need not be nominated in advance. 
A natural way of seeking the contrasts which are to “ blame ” for the rejection of overall 
homogeneity is to start with all 3&(k— 1) differences. All this may be done without 
affecting the size of the overall test. If the reader will now refer back to the original 
discussion of the purposes of multiple comparisons in 35.51, he will probably agree 
that Scheffé’s all-contrasts method is very close to achieving those purposes. 

Gabriel (1967) gives a general theory of simultaneous test procedures, 


35.64 Dunn (1961) considers a procedure intermediate between setting confidence 
intervals for a single contrast and setting them for all contrasts. Her method requires 
the prior nomination of m contrasts as of special interest. The intervals obtained 
(based on “ Student's ” t-statistic) are shorter than those obtained from either Tukey’s 
or Scheffé’s method if k (the number of parameters) exceeds 2 and т is not too large— 
this advantage increases as k, or the number of d.fr. for Residual, or the confidence 
coefficient 1—«, increases. The very simple result which underlies this method is 
given in Exercise 35.14. The procedure is improved by Siotani (1964). 

Bohrer (1967) improves on Scheffé's method in the case when the contrasts are of 
variables known to be all non-negative. 


Ordered and metrical classifications 

35.65 ‘Throughout this chapter, the classification variables have been quite general, 
no assumption having been made about whether, e.g., the groups in a classification 
are ordered in any way. However, if precise information is available concerning the 
basis of the classification, the SS in the AV table can be further partitioned into corre- 
sponding components. For example, if it is known that the groups correspond to 
equally-spaced values of an underlying variable, the orthogonal polynomials discussed 
in 28.18-20, Vol. 2, may be used to assign a single d.fr. to the linear, quadratic, cubic, 
and higher-degree effects of the classifying variable, if necessary proceeding until 
all (6—1) d.fr. are exhausted. Тһе method used is precisely that of Example 28.3. 

In more complex classifications, interactions as well as row- (or other) effects 
may be partitioned in this way if all the underlying variables are equally spaced. Com- 
putational methods are given by R. L. Anderson and Bancroft (1952) and in Fisher 
and Yates’ Tables. 


35.66 Bartholomew (1961) considers the case of an ordered classification as an 
alternative to the hypothesis of homogeneity of the groups. This is precisely the situa- 
tion discussed in 31.74. When the n; are all equal, the distribution-free test based on 
(31.151) is found to have higher power asymptotically than the LR test if the 6; are 
equally spaced, but to be less powerful if, at the other extreme, all the 6; are equal 
except one. See Exercise 35.15, and also the paper by Chacko (1963). Shorack (1967) 
extends the results to more complex classifications, including also a discussion of 
distribution-free tests for ordered alternatives. 
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Analysis of covariance 

35.67 A natural extension of AV arises when, in an analysis of classified data 
such as we have been discussing in this chapter, we have available to us not only the 
observations on y but also the values of one or more further variables x, known or sus- 
pected to influence the value of у. If the data were not classified, we should carry out here 
an ordinary regression analysis of y on the x’s, but what we wish to investigate now is 
the joint effect upon у of the classification (possibly a complex one) and of the measured 
variables х. There is more than one possible purpose for such an analysis. 

Commonly, the values of the x’s are to be eliminated by regression methods, so 
that we may be free to analyse the effect of the classification upon y after discounting 
the influence of x—for example, when x is the value on an earlier occasion (before 
the treatments giving rise to the classification) of y itself. ‘Thus, if the effect of different 
teaching methods upon children’s performance in a school subject is to be measured, 
their initial levels of performance would be the values of x, and their final levels the 
values of y; an alternative would be to take a measure of general intelligence as the 
value of х in the analysis; or it might be thought worth while to include both the initial 
level of performance and the intelligence measure as x-variables. We may describe 
this as an elimination-motivated analysis. Its methods are to ensure “fair” com- 
parisons among treatments and also, by removing unwanted variation due to x, the 
reduction of residual variation. 

It will obviously make interpretation of results simpler if, as in our example, x is 
unaffected by the treatment yielding the classification, but the analysis may be carried 
out in any case. 

However, we may, alternatively or additionally, be interested in whether the regres- 
sion of y on x is affected by the classification at all; in our example, we might ask whether 
the regression of final performance-level upon initial performance-level is the same 
whichever method is used. Our motive for the analysis is not now elimination, but 
intrinsic interest in the relationship between the variables. 


35.68 This branch of the subject has come to be called Analysis of Covariance, 
because the regression calculations involve partitioning sums of products of y and x 
in the same way as ordinary AV involves the partitioning of SS. The variables x are 
usually called concomitant variables, implying that y is of prior interest. 

An extended expository review of uses of the Analysis of Covariance is contained 
in a set of seven papers in the September 1957 issue of Biometrics (Vol. 13, No. 3). 
A clear introductory account is given by D. К. Cox (1958). We shall be concerned 
purely with its theoretical aspects. 


35.69 Since we have already seen that regression analysis and AV can each be 
treated within the framework of a linear model, it is evident that Analysis of Covariance, 
a mixture of the two, can be so treated. "The interpretative convenience of having 
the concomitant variables unaffected by the treatments is now seen to be a special 
case of the convenience of having different sets of regressors uncorrelated in linear 
models. One can therefore set up a linear model ab initio for any situation requiring 
an Analysis of Covariance. 
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However, this tedious process can be avoided if the AV which is (so to speak) 
embedded within the Analysis of Covariance is of a known form; we can then extend 
an AV to produce an associated Analysis of Covariance, by extending the linear model 
appropriately. Moreover, it transpires that this process of extension is a quite general 
one, enabling us to introduce extra parameters into an existing linear model. As its 
heading indicates, the algebraic discussion which follows has, therefore, no limitation 
to AV situations. 


The extension of a linear model to include further parameters 
35.70 Suppose that a linear model (singular or non-singular) 

у = X0+e (35.114) 
has been fitted to the observations, and that the LS estimator of @ obtained by the 
methods of Chapter 19 is 

6 = Ty, (35.115) 
where of course T = (X' X)-! X' in the non-singular case. We find that the Residual 


SS is Е lc 
(y-X6)(y-X6) = (I-XT)y)' {(1—ХТ)у} 
= у (1-ХТ)у, (35.116) 


since TX = І is the condition for unbiassedness of 6. The matrix (I—XT) is idem- 
potent. 


35.71 Now consider the extended model 


y = X028 e, (35.117) 
or y-Zp = X0--c. 
If B were known, this would have the solution, from (35.114-15), 

6 = T(y-2Zp), (35.118) 


and thus the LS solution of (35.117) for 6 and Ê may be obtained by solving for Ê 
alone y = X. T(y —Z8)-- Zg--e ог 
(I-XT)y = (I- XT)ZQ +e. (35.119) 
(35.119) is a linear model, and we assume that it is non-singular. We therefore have, 
from Chapter 19, 
В = {Z' (I-XT)Z}-1Z' (1- XT)y, (35.120) 
VÊ) = o? (Z'(I- XTZ) . (35.121) 


35.72 The reduction in the Residual SS due to the extension of the model is 
(I-XT28)' (L-XT28) = &Z'1-XT)y. (35.122) 

On comparing this with (35.116), we see that they embody the same matrix (I- XT); 
(35.122) differs only in that @7/ replaces y' in premultiplying this matrix. ‘This simplifies 
the computation of (35.122), since we have to replace a quadratic form in y by a set 
of corresponding bilinear forms, obtained by each column of Z in turn replacing y' 


in (35.116). "These bilinear forms, assembled into a column vector, are premultiplied 
by B to obtain (35.122). 
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The difference between (35.116) and (35.122), 


(y- Z6) 1—ХТ)у, (35.123) 
is the Residual SS when the extended model (35.117) has been fitted. 

It is easy to see that a reduction of exactly the same form as (35.122) applies to the 
minimized SS under any constraints upon the elements of Ө: the only change is that 
(I— XT) is replaced by the matrix Q of the quadratic form of the minimized SS. 
The analogues of (35.122-3) are then 8'Z' Оу and (y—Zf) 'Qy respectively. 


35.73 The application of the results of 35.70-2 to Analysis of Covariance is 
immediate. Corresponding to the SS which emerge in AV situations, (35.122) pro- 
duces sums of products. Computational instructions and examples are given by 
Scheffé (1959), and worked examples by R. L. Anderson and Bancroft (1952) and in 
the September 1957 Biometrics issue. 


EXERCISES 


35.1 Verify that if all mj = m, the SS in (35.63-4) become 
5, = a E У уџр)?/(тст), 
р 


5, = тел сауд 
S, = E (ZZ уур)*/(тт)— Sy, 
з зу 
S = 22 Œ yup)*/m - (S1 5,+ Sj), 
т 
5а = ХУ У yip - (51+5.+5+ 54), 
77 


and show that if т = 1, these SS reduce (the summation over р now being redundant) to 


5, = Œ Zy9*/eo, 

S, = Z(Eyi)!/c- S, 
DE" 

5, = 25 yu)yr- S, 


S,-XEXy4-(Gi 5+ Sy), 


5в=0. 
35.2 Verify that if all ту = m, the matrix C defined at (35.39) becomes 
2Е AX B. «DTE 
Bi QE CE, «5 5B 
c же, > аазы EOS * 
(r—1Xc—1)x(r-1(—1) M 
CE 
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where 
а ud 
qu A 
Eee МЕ 
(e-1)x (e—1) dt 
"S 
1 12 
Show that 
-E= -E7 ... -E7 
i А7. К 
с = 1 i : = 
rm Us ^ 
t. EC 
mo TEC (,-1)Е- 
where 
e—1 -1 =1 
=1 c-i 
е. ВШ 
с 


: s en 
—-1. . -1 c-i 
and hence verify that the LS estimator of бү; is given by (35.50). 


35.3 Generalize the AV table of Example 35.5 to a three-way hierarchical classification 
and give the three LR tests of the hypothesis that there is no variation in (a) the group means; 
(b) sub-group means within groups; and (c) sub-sub-group means within sub-groups. 


35.4 In Example 35.4, show that if it is postulated that H, of (35.54) holds, so that there 
are known to be no interactions, the SS attributable to row-effects is M — S;, and the SS attribut- 
able to column-effects is M—.S,, where S, and S, are defined by (35.59-60). Show that the 
Residual MS has (n—r—c+1) d.fr. in this case. 


35.5 Show thatif, in a (r x 2) cross-classification, the differences of cell-means di = уп. — yis. 
are analysed by the method of weighted squares of means applied to the y; in 35.31, the SS 
X Wi(di- X Widi/X Wi)?, where W; = ng! 4- ng, provides the test of the hypothesis that the 
i i i 


i 
interactions are all zero. 


(Yates, 1934) 


35.6 Ina2x2x2x ... = 2" cross-classification, show that if a 2 x 2"-! table is formed 
for any one of the classifications (А, say) against all possible combinations of the others, the 
unweighted mean of the differences of the row-cell means provides an efficient estimator and 
test of the effect of A. Show that this is a generalization of 35.31, and that any interaction 


can similarly be tested. 
(Yates, 1934) 


35.7 'The table below gives Brandt's data, used by Yates (1934), for an 8x2 cross- 
classification by breed and sex of 533 slaughtered pigs; cell frequencies жу and the cell totals 
of the variable studied (the logarithm of the percentage bacon yielded by the carcass) are shown 
overleaf. 
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. Sex Female Male 
Бес > = тї E yüp ni2 Z yap 
I 33 | 66°55 89 | 181-04 
п 51 98-69 141 281-43 
ш 13 25-90 17 34-20 
IV 4 7:62 9 17:58 
у | 8 | 14-64 4 8-20 
VI 15 | 28-11 32 64-42 
VII 35 66-90 47 90:52 
VIII 12 | 23:32 23 46:70 
"Готліѕ 171 331-73 362 724-09 


The Total SS, which is not obtainable from the above, was 13-0142. Using 35.31 and Exercise 
35,5, show that (neglecting the 1 d.fr. for the general mean) the AV is 


Variation D.fr. SS MS 

Due to breeds 7 0-6056 0-0865 
Due to sexes 1 0:3032 0:3032 
Breeds and sexes 8 1:0415 

Interactions 7 0-2300 0-0329 
Total between classes 15 12715 0-0848 

Residual 517 11-7427 0-0227 
‘Total minus general mean 532 13-0142 


Show that breed and sex each have effects upon bacon yield, but that they do not interact. 


35.8 If there is only one observation per cell in Example 35.6, show that the AV table 
(35.95) holds good with the Residual SS = 0, and that if the second-order interactions are 
dropped from the linear model, the SS with (7—1) (c—1) (1—1) d.fr. becomes the Residual 
SS for testing all other parameters (cf. Example 35.3). 


35.9 Show that the AV for a (r xc) cross-classification with m observations in every cell 
may be formally constructed from a (у x cx) cross-classification with exactly one observation 
per cell, in which the layer classification factor is “ replication,” its main effect and all interactions 
concerned with it being defined to be identically zero. 


35.10 For the Newman-Keuls procedure defined in 35.54, show that if the true means 
of any subset of p of the Ё groups are all equal, while all the other groups have different means, 
the probability of wrongly adjudging a pair in the “ cluster " to be heterogeneous cannot exceed x. 

If there are m such clusters of truly equal means, show that the probability of wrongly 
adjudging a pair from any such cluster to be heterogeneous cannot exceed mx. 


(Hartley, 1955) 


35.11 If in 35.57 the k variables (;—6;) are multinormally distributed with all variances 
c*/N and all covariances equal to —4?6*/N, show that the k variables 2; = X;—6;--x, are 
independently normal if x, is a normal variable independent of the x; with zero mean and variance 
A'g*/N. By applying the method of 35,57 to the z;, show that, with probability 1—2, 


51—8;—94 (0 +22)s*/N} <0; — 05 & Xi 354 qu (03-22)? /N* 
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holds simultaneously for all $4(k—1) differences 6;—6;. Show that the result of 35.59 generalizes 
in exactly the same way. 
(Scheffé, 1959) 


35.12 In the one-way classification of Example 35.1, without loss of generality, take 
b me б, as origin and ø as unit. Show that the value of t? = c с Du с/т) is maximized 
бог choice of the c; when ci oc mô, so that > € = 0 and | t|i A the ы observed absolute 


ratio of a contrast to its standard error. Show further that t? = S,, the numerator SS (defined 
by (35.21)) of the overall variance-ratio test defined by (35.22), so that the overall test essentially 
tests the largest observed contrast. 

(This result holds quite generally— 

cf. Scheffé (1959) and Gabriel (1964)) 


35.13 By defining a dummy parameter 0, = 0 with estimator б = 0 also, show that all 
linear combinations y = X ci: (where È c: need not be zero) may have confidence intervals 


i i 
set for them by (35.113) with q increased by 1; and that the method of (35.108) may similarly 
be used if А is increased by 1 and à X | c| is replaced by 
i 
max о, “>0; PL |, а<0} 
(Tukey, 1953; Scheffé, 1959) 


35.14 In 35.64, consider А non-independent events with equal probabilities P, of occurring. 
Show that Px, the probability that they all occur, satisfies Р.> 1— &(1— Р), and let P, be the 
probability 1 —« that a “ Student's ” t-statistic (у d.fr.) lies in the interval (— t+, t4). Show 

А 


that if m linear combinations A, = X сиб, s = 1, 2, . . ., m, are estimated by ls = X cei bt, 
i 


— 
then („»—5,)/{Ї(1Ь)}# is distributed in “ Student's " form with > d.fr. for each s. Hence show 
that 
Prob {ls—te[P(ls)]! <As<ls + t[P (9) ) > 1 — a. 
(Dunn, 1961) 


35.15 Consider the distribution-free statistic U for testing k samples against ordered alterna- 
tives defined at (31.151), and the competitive statistic U^ defined similarly, except that Upg 
is replaced by 


=% Ec — хај) = np nq (3p — Xo), 


{=1 
so that 
U'- X 5 Vom — #0). 
p=1 4-р+ 
For normally distributed observations xis Sid means E(xis) = 0; and equal variances c?, show 
that if m, = n, =... = ng = N, the asymptotic power functions of U and U’ in testing equality 


of the 6; against the alternative hypothesis that the 0; are equally spaced, are respectively 
G(A(3/2) —4,) and G(A-A), 


where С is the standardized normal d.f., A? = b: (0;—6)2/o?, and G{—A,} = «as in Chapter 25. 


Deduce that the ARE of U compared to U' is йк Show that this last result holds whatever 
the relations among the ordered 0;. 
(Bartholomew, 1961) 
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35.16 In 35.54, let R and Р be any subsets of the Ё groups such that R contains Р. Show 
that (35.21) calculated for R can never be less than the same SS calculated for P. 


35.17 Show that the simultaneous test procedure of 35.54 has the property that the prob- 
ability of wrongly adjudging any homogeneous subset to be heterogeneous is at most x. 
(Gabriel, 1964) 


35.18 In 35.54 define a new step-by-step procedure based on the SS (35.21), but applied 
in the manner of the last paragraph of 35.53. This has the same overall size x as the simultaneous 
test procedure of 35.54. Show that for such a step-by-step procedure the critical value used 
for the SS must increase with the size of the subset being tested, and hence that a subset can 
only be adjudged heterogeneous by the step-by-step method if it is by the simultaneous method. 

(Gabriel, 1964) 


35.19 In 35.62-3, show by the Cauchy inequality that 
k z k k 
тах 4 X «@-вд} = x 2 X (0,—0)*, 
i=l i=l 


a М1 
and hence that the largest squared difference between an observed contrast and its expectation 
is distributed as (© c?)qs® F, where F is the test statistic for the overall hypothesis that the 6; are 
б 


equal. Hence establish (35.113). 
(This proof is due to M. H. Belz and A. M. W. Verhagen.) 


СНАРТЕК 36 
OTHER MODELS FOR THE ANALYSIS OF VARIANCE 


36.1 Throughout the previous chapter, we have been concerned with the applica- 
tion of the general linear model to the analysis of observations classified into groups. 
Underlying the whole of the discussion was the assumption, explicitly written into 
the linear model, that the classification affected the observations through their mean 
values, but not otherwise. In the case of the general linear model, therefore, Analysis 
of Variance (AV) is accurately described as an analysis of means, which is carried out 
through certain sums of squares (SS) computed from the observations. 

It is a remarkable fact that we are led to very similar (and in the simpler cases, even 
identical) computations of SS when investigating a quite different type of situation. 
In the early development of the subject, this similarity in analysis tended to obscure 
the fundamental distinctions between the underlying mathematical models, which 
were first set out in detail by Eisenhart (1947). AV based on the LS analysis of the 
general linear model, as in Chapter 35, was there called Model I AV, a name in common 
use subsequently. The other well-established mathematical model, Model II AV, 
will now be investigated. 

The reader will realize that the Model I definition (cf. 35.4, 35.9-10) of the term 
“ Analysis of Variance" must now be broadened. We define AV generally as the 
study, whether by means of classified data (cf. 35.15) or not, of the resolution of 88 
into component SS attributable to various factors, acting singly and in combination. 


Model II: components of variance 
36.2 Instead of the general linear model (19.8), consider the superficially similar 
model 
= 10 + Xu +e. (36.1) 


(х1) (nx1) (nxp(px1) (nx1) 
In (36.1), as in Chapter 35 (and earlier in Exercise 19.1, Vol. 2), we isolate the “ general 
mean " 0, which here will need no subscript. As before, 1 is a vector of units and X 
a known matrix of constants, while € is the vector of errors in the observations. The 
crucial change is the replacement of the parameter-vector 0 in (19.8) by a vector u 
of р random variables. 'lhus (36.1) states that у; (i = 1,2, ... , 2) is composed of 
the general mean 0, plus a linear combination of р random variables u;, plus an error 
term, ej There are (p+1) random components of y; instead of only the one in 
(19.8). 

We assume, as at (19.9-10) for the general linear model, that 

Е(є) = 0; V(e) = Е(єє) = о21, (36.2) 
where we have added an identifying subscript to the error variance o? to distinguish 
it from the variances of our new random variables. We further assume that 

E(u) = 0, (36.3) 
57 
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which is why we isolated the general mean б, and that 
var иу = o5; соу (и, ш) = 0, = 36.4 
cov (4) = 0, allij. e 
Thus all our random variables have zero means and are uncorrelated in pairs. 


36.3 The parameters of interest in the model (36.1) are the оў, the variances of 
the и, and the error variance o. We may reduce the dimensionality of the problem 
by taking account of the fact that some of the оў may be known to be equal. Suppose, 
then, that there are k different variances оў, where k< p. We rewrite (36.1) in the form 


k 
у = 10+ E X,u,+e, (36.5) 
j=l 
where the и, are now vectors (subvectors of u), which contain p; uncorrelated random 
k 
variables (2 py = D with zero means and variance oj. X, is a (пх р) submatrix 
-1 


of X. 
In (36.5) we write W for the matrix E(yy’) and V, for the dispersion matrix of y, 
assumed to be non-singular, so that, using (36.24), 


k 
V, = W-0r-x 7; X,X +o71. (36.6) 
3-1 


Essentially, therefore, we wish to estimate the parameters which appear as coefficients 
of the (k+1) components of the dispersion matrix of the observations, and it is appro- 
priate that (36.5) is often called a components of variance model for AV as an alternative 
to the “ Model II" label. 

(36.6) emphasizes an essential distinction between (36.5) and the general linear 
model: the observations are not now all uncorrelated (V, is not diagonal), since they 
have as components linear functions of the same variables uj. 


Example 36.1 
Consider the model (36.5) with А = 1, X, a (пх р) matrix patterned like X in 
Example 35.1, and u, a (px 1) vector. This is a one-way classification problem for 
the present model, in which the observations are in p groups and the gth observation 
in the ith group is 
Yia = OF Uy Eip 
where the и; (the р components of ш) and the ej, all have zero means, are all uncorre- 
lated, and have variances 
varu; = oj, alli, 
var в = 02, alli, g. 
It follows immediately that 
var ya = oon 
so that here the two parameters of the model are literally components of the variance 
of the observations. It was this which gave rise to the term “© components of variance ” 
which we have attached to the model (36.5) in general. 
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36.4 Our investigation of the properties of the model (36.5) must start from 
the beginning; none of the LS theory used in the general linear model is now applicable. 
Our treatment follows that of Graybill and Hultquist (1961). 

(36.5) can be written more symmetrically as 

kl 
y- E хуш, (36.7) 
where we define 
X,=1, u, = 6(scalar), X, =I, щш. = є. 
There are (k +2) parameters in (36.7), namely 0 and the (k +1) variances appearing 
in the model. ‘The assumptions (36.2-4) are now summarized as 
E(u) = 0, V(u) = цш) = £n (36.8) 
E(uu)-0, izj; if =1,2,...,k+1, 
where оўу = o?. Formally, we may write 


оў = E(u, wy) = 0?, (36.9) 
so that our parameters are oj (j = 0, 1,2,...,А+1). We can now rewrite (36.6) 
as 
k+l 
W = К(уу') = E af A, (36.10) 
j=0 
where A; = X;Xj, and 
k41 
Vy = E OPA, (36.11) 
3-1 


Unbiassed quadratic estimation of the parameters 

36.5 In our investigation of the general linear model, we found at (19.19) the 
condition which must be satisfied if linear functions of the parameters are to be un- 
biassedly estimated by linear functions of the observations. In the components of 
variance model (36.7), the parameters occur in (36.10) as coefficients in W, the matrix 
of second-order moments of the observations, and it is natural to seek quadratic esti- 
mators of them. We now prove that a necessary and sufficient condition that the o? 
be unbiassedly estimable by quadratic forms у” С, y is that the matrices A, are linearly 
independent. 


36.6 First, assume that there exist matrices C, such that 


Е(у'С,у) = оў. (36.12) 
Using (36.78), this implies that 
о = p Хуш) С,( Хш) = СА vj XC, X;uj. (36.13) 
As at (19.38), 
E(ujBu) = o?trB, ў=1,2,...,Ё+1, (36.14) 


and this also holds trivially for j — 0, so (36.13) becomes 
kl 
o; = X oj tr (XjC, Xj) = X oj tr (X; C) 
0 H 
= Ў оў tr (A,C,). (36.15) 
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Equating coefficients of оў in (36.15), we find 


tr(A,C,) = 1, } 
1 
tr (A,C)) = 0, j #5. (616) 
Now let Àj, 4, ..., аз be any constants satisfying 
kl 
E I, A; = 0, (36.17) 
=0 


which, if the J, were non-zero, would make the А, linearly dependent. Since, by 
(36.16-17), it would then follow that 
I, = 1, tr(A,C,) = tr (C, ХА) = 0, 
j j 


we see that (36.17) can only hold if the J; are all equal to zero. Thus (36.12) implies 
linear independence of the A,. 


36.7 To prove the converse result, let the 4n(n+ 1) elements on and above the 
leading diagonal of the (п хл) matrices W and А, in (36.10) be placed in the same 
(arbitrary) order into vectors w°, aj, j = 0,1,...,k+1. (36.10) then implies that 

E(w’) = SEE (36.18) 


If we write A* for the [4n(n+1)x(k+2)] matrix formed by juxtaposing the aj, and 
о? for the vector of the оў, (36.18) becomes 
E(w") = A'o’. (36.19) 
Since the А, are now assumed linearly independent, so are the aj which were formed 
from their elements. ‘Thus A’, formed from the aj, has rank (k+2), and we can find 
(k+2) linearly independent rows in it which form a non-singular submatrix А". 
We write w’ for the corresponding subvector of w°. (36.19) implies that 
E(w") = Atte? 

and hence 

(A) E(w") = o°, 
or 

E((A") w^") = o°, (36.20) 
which establishes that there is a quadratic unbiassed estimator of the parameter-vector. 

We henceforth assume the linear independence of the A,. 


Sufficient statistics in the normal commutative case 

36.8 So far, no assumptions have been made concerning the distributional forms 
of the random variables u; in our model. We now investigate the case where each 
u,(j =1,2,..., k+1) is multinormally distributed. Together with the zero means 
and covariances assumed in (36.8), this implies that the p random variables и, j = 1, 
2,...,p, are independent normal variables with zero means. 

The normality assumption alone will not take us very much further: to make 
substantial progress, we must impose the additional condition of commutativity 

АА, = AA, ij =0,1,2,...,k+1. (36.21) 

The А, аге symmetric, so we always have 


АА, = АА; = (АА). 
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Thus what (36.21) requires is that each A; Aj, as well as each А„ be symmetric, That 
this condition is restrictive is shown in Example 36.2. 


Example 36.2 
Consider again the one-way classification situation in Example 36.1, with k = 1. 
Here A, = 11’ as always, and from Example 35.1, 


(ШУ, 0 
s (ш), 
D (ш), 
where the suffixes give the number of rows and columns in the submatrices. Multi- 
plication shows that A, A, in its first 7, columns has every element equal to n, in its 
next лз columns has every element equal to т, and so on until in its last л, columns 
every element is equal to м. Thus A,A, is symmetric (A, and A, commute) only 
if all the л; are equal. Since X, = I, A, = I also and always commutes. The present 
model will therefore cover the one-way classification only in the balanced case, with 
equal frequencies in the p groups. Contrast Example 35.1 for Model I, where group 
frequencies were quite unimportant. 


36.9 As Example 36.2 indicates, we can only expect (36.21) to hold in general 
for the balanced case. We now proceed with our investigation, bearing this restriction 
in mind. 

The normality and independence of the p random variables uj, j = 1,2, ... , p, 
implies that the (correlated) variables уз, ys ... y, are multinormally distributed 
with 

E(y) = X,u, = 10 
and dispersion matrix V, given by (36.11). The quadratic form in the exponent of 
their multinormal distribution is therefore 


О = (y- 10) V; (y 10), (36.22) 
distributed in the chi-squared form with n d.fr. by 15.10. 


36.10 Now, because of the commutativity condition (36.21), there exists an 
orthogonal matrix P which simultaneously diagonalizes all the А, so that 
PA;P' =D, (36.23) 
where D, is a diagonal matrix. Moreover, we may choose P so that one row (say, its 
first, denoted by P,) has elements all equal to n and 
Pl=nt, Pl=0, j#1, (36.24) 
where Р, is any set of rows of Р not including P,. 

(36.10-11) show that W and V, are also diagonalized by P, the leading diagonal 
of PWP’ and PV,P’ respectively containing the latent roots of W and those of V,. It 
follows at once from (36.24) that these two sets of latent roots (which are all positive) 
coincide except for the first: if the roots of W are Ay j= 1,2, ... ‚п, those of V, are 
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A, — n0? = At, say, and 4; j > 1. These latent roots are, of course, functions of the 
parameters оў. 

If s is the number of distinct roots of W, and s* the number of distinct roots of Vy, 
we may have s* = s—1 when 2* does, but 2; does not, coincide with some other root; 
or s* = s+1 if this situation is reversed; ог 5° = s if neither or both of 4,, 21 coincide 
with another root. Graybill and Hultquist (1961) show that s>k+2; and that 

s*<1+rank (X,:X,:...: Xx), 
subject to a further condition. 


36.11 Since PP’ = I, (36.22) is identical with 
Q = (Py— P10) (PV, P’)-(Py—P10). (36.25) 
We now partition P into P}, P,, ..., P, where P, is the first row as before, Pj (j>1) 
is of order (n x n), and ny is the multiplicity of the root 2; when Aj is ignored. Thus 
t=sort=s+1. Using (36.24), (36.25) now becomes 


Руу-тб\' /1/2 0 P,y—niü 
0=| Py L/h Py | (36.26) 
Py 0 Ij] \ Ру 


where I, is the identity matrix of order ny. 
We see at once from (36.26) that the 2 statistics 
Py, УРРу, j-23.., 
are sufficient for the (k--2) parameters of the model, since no other statistic enters 
the Likelihood Function. It is intuitively obvious that they are minimal sufficient. 
The proof of this follows directly by the method of 23.18 if 2; is not equal to any other 
of the 2j; if it is so equal, the proof is extended by СтауЬШ and Hultquist (1961), who 
also use a result by Gautschi (1959) given іп 36.16 below to show that if t = s = k+2 
(its smallest possible value—see 36.10) the set of minimal sufficient statistics is complete. 


36.12 (36.26) shows directly that the statistic P; y has a marginal univariate normal 
distribution with mean z!0 and variance 4; = P,V,P;. Further, each of the other 
(t—1) components of the minimal sufficient set is a quadratic form 


О, = у'Аг'Ю;Р,у, ј = 2,3,...,}, 
in multinormal variables, Each matrix 2 ! Pj РУ, is idempotent, since P, V, Pj = 2,1, 
Furthermore, since P; V, P; = 0, j  /, the sum of the Q; 
t 
= лану 
also has the property that its matrix 
t 
Ram, 


t 
is idempotent. Thus O* xu Q; is a decomposition of the type discussed in 35.7. 


"Тһе idempotencies just stated therefore imply the result that the O;, = 2,3, ... , t, 
are distributed with degrees of freedom n; in the у? distribution, which is here central 


OTHER MODELS FOR THE ANALYSIS OF VARIANCE 63 


because the non-central parameter is (10) 4;!P;P,(10) = 0 by (36.24). Since the 
result above for P,y is equivalent to the quadratic form О, = y'(41) ! P; P, y having 
a non-central у? distribution with 1 d.fr. and non-central parameter 10°, we see finally 
that the t quadratic forms О, have ranks adding to л, the rank of their sum О at (36.26). 
Each is therefore independent of each of the others by 35.7. 


Analysis of variance in Model II 

36.13 We now observe an important distinction between Model I and Model II: 
in the latter, the SS y’y cannot be decomposed into quadratic forms which are them- 
selves independently distributed in the (central or non-central) chi-squared form, for 
y' y itself is not so distributed, on account of the covariances between the observations— 
it is (36.22) which has that property instead. However, 36.11-12 show that the t quad- 
ratic forms О, are so distributed. The matrices of these eu are 


PIA,  PBBA, j2,3,. 
and since x Р;Р, = P'P = I, we see that we may Schi y' y into £ quadratic 


j= 
forms one when divided by the corresponding distinct latent roots of V,, are inde- 
pendent chi-squared variables. Moreover, the degrees of freedom are ‘simply the 
multiplicities of the roots, and all the forms except Q, have central distributions. 


36.14 Since 
E(Q) = т, j>2, 


E(y Pj P; y/n) = 4, j>2. (36.27) 
The latent roots may therefore be found as the expectations of the corresponding 
Mean Squares (MS) in the AV table. Since 
Е(Р, у) = б, 
0 is estimated by the mean of all the observations у, as is obvious. We require to 
estimate the other (k+1) parameters, the variances. If the 4, are (k+1) different 
functions of these parameters, (36.27) may be solved to give estimators of the parameters. 

In the common case when the 2; are all /inear functions of the parameters, (36.27) 
is particularly easy to solve to give unbiassed estimators of the оў. 

Graybill and Hultquist (1961) show that an AV in our extended sense exists with 
the latent roots all different functions of the parameters if and only if the commutativity 
condition (36.21) holds and W = E(yy’) has s = k+2 (the minimum possible number 
—see 36.10) distinct latent roots. Under the multinormality assumption, the set of 
sufficient statistics is then complete (cf. 36.11) and the estimators are unique MV 
unbiassed for their expectations, by 17.35. Graybill and Hultquist (1961) show that 
the MS in (36.27) remain MV unbiassed quadratic estimators of their expectations 
under weaker conditions than multinormality. 


we have at once 


36.15 We are now in a position to connect our study of Model II with the Model I 
AV of classifications investigated in Chapter 35. Any Model I AV table is a decom- 
position of the SS y'y into component SS (of which one is for the general mean). 
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and the d.fr. of the table (ranks of the quadratic forms) always add to л, the number 
of observations. If (36.21) is satisfied, the same decomposition (cf. 36.13) will hold 
in Model II, since these ranks remain additive, but it is now fundamentally (36.22) 
which is being decomposed as at (36.26), and it is the ratio of each SS (except the 
general mean) to the expectation of its MS which is distributed in the chi-squared 
form, which is now always central. 

To every Model I AV (balanced, to satisfy (36.21)) there is therefore a corresponding 
balanced Model II AV. 


Example 36.3 Balanced one-way classification 
In the balanced one-way classification treated in Examples 36.1-2, the Model I AV 
table (35.24) will hold under Model II also, by 36.15. We require the expected values 
of the MS in the table, except that for the general mean (for which, see Example 36.5). 
It is immediately evident that the Residual SS, which we now call Ss, satisfies the 
identity 
S, = 2 (Vamy) = a 2 (64, 5:)^ 


and it has exactly the same distribution as in Model I, since it is free of 0 and the u. 
The corresponding MS therefore has expected value o? = A; say. The between- 
groups SS is denoted as at (35.21) by 


5, = Eno») - um Tui) (u. +е,)) 55 


n; is a constant since we now have л observations in p groups“ of equal size, n; = n/p. 
Because the w’s and the e’s are independent, 


var (+) = +, 
"i 
ог &, is a mean of л; error terms. Since E(u;+¢;,) = 0, this shows directly that the 
variable 


E (nre) +e} Met f/m) 


has the chi-squared form with (p—1) d.fr., since it is a standardized sum of squares 
about the sample mean. The expected value of S,, the between-groups SS, is therefore 
(р 1)n; (of +02/n;) and that of the MS is E{S,/(p—1)} = +n} = 2a, say. 
We thus have the two independently distributed chi-square variables 
5/23 = z п(у.-7.)%/(9+т01), 53/23 = 2 E Gra 7/02, 


whose ratio, after division by d.fr. ((p—1) and (m—p) respectively) will have the F- 
distribution. If o? — 0, the denominators are identical, so that the same F-statistic 
(35.22) as in Example 35.1 may be used here to test o; = 0, which is the Model II 
hypothesis of no difference between groups. 

The same hypothesis in the two different models may thus be tested by the same 
statistic. But two points should be noted. First, we have as yet no assurance that 
this is an optimum test in Model II, as we know it to be in Model I from general linear 


(*) We use p here, Where k was used in Example 35.1, because k has another connotation in 
this chapter. 
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hypothesis theory. Second, although the test statistic is the same for both models, 
its distribution is the same only under the hypothesis of no difference between groups. 
Its power function must necessarily be different in the two models, since the alternatives 
are quite different in the two cases (cf. Exercise 36.1). 

It will be seen by solving the nig for E(S,) and E(S;) that 


iO olee 


This unbiassed estimator of o; can Es be negative; it remains the MV unbiassed 
estimator since it is a function of the complete sufficient statistics (ў, S, S3) (cf. 36.11). 
If it is truncated to zero whenever the ML estimator of 2, < that of 4; (as in Exercise 
36.5), bias appears, but the mean-square-error is much reduced. Further improvement 
is possible—see Wang (1967). 


Example 36.4 Balanced two-way cross-classification 
For the balanced two-way cross-classification, the Model I AV table ((35.65) and 
Exercise 35.1) will hold under Model II by 36.15, apart from its last column, which is 
specific to Model I. In the model (36.7), we now have k = 3. For convenience, 
we denote the three variances to be estimated by ok, об and cc, indicating that they are 
respectively the variances of the Row, the Column and the Interaction (Row x Column) 
random variables и. It is easy to see, as in Example 36.2, that the commutativity 
condition (36.21) holds; indeed, considerations of symmetry make this obvious. 
Leaving aside the general mean, the four MS must have their expectations evaluated. 
As in Example 36.3, examination of the model, now written in an obvious notation 
Jp = OA uj + ty gt Eijp (36.28) 
(-12,...,15 j2152,..,6 р 1,2,...,т), 
shows that the Residual SS, now denoted by S, = zz = (Yin Ju), is identical 


with z 2 2 (65—21)? and has the same distribution as i " Model I, so that its MS 


has арс c, and this will evidently be generally true for the balanced Model П. 
The Rows SS is now written 


S, = em È (yy. = em E (uu; 6) - (uu + £^ 


and as in Example 36.3, 
var (uj, +; ү.) = ой+оҗе/с+ oz/ (cm). 
'Thus the Rows MS has expectation 


E(S,/(r—1) = oit majo + mos , (36.29) 
and by interchanging row and column symbols, we have similarly, for the Columns MS, 
E(S,/(c-1)) = 02+ mono rmoc . (36.30) 


'The Interactions SS is written 
Sa = mE (ун -I-I 922) 
= тУ У ((ug-55)—(ui.6.)— (565) (vu. +e.) (36.31) 
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The expression in braces on the right of (36.31) has zero mean, and the expectation of 
its square is therefore its variance. Now 

var (uj;—u;,—uj+u_) = %(1 - ( - 
as may be seen by evaluating the four variances and six covariances required. Similarly 


var (ej, — Ei — Ej. HE.) = (1 -) ( -3), 


с r, 
and thus the SS has expectation 


E(S,) = mre (2+2) (1-31-53 


Dividing by the d.fr. (r—1)(c—1) and cancelling, this gives for the MS 

E(S,/[(r—1)(c—1)] = 02+ тото. (36.32) 
We have thus obtained a column of expected MS to replace the final column of the table 
(35.65): 


E(MS) 
Rows: Ag = 02+ moko + стоћ 
С, ыз рар (36.33) 
Interactions: 2, = o2 + тойс 
Residual: A= oO: 


(36.33) is immediately solvable to give unbiassed estimators of the variance parameters. 
Apart from 62, any of these may be negative since they are differences, just as at the 
end of Example 36.3, and the remark made there applies here equally. 

(36.33) tells us which functions of the parameters must be used as divisors to obtain 
central chi-squared variables from the SS in the AV Table. Examination of these 
shows that the happy coincidence of Model II and Model I tests found in the one-way 
classification of Example 36.3 no longer survives here. If we wish to test that either 
сӯ (or оф) = 0, the expected MS of Rows (or Columns) will be equal to that for Inter- 
actions, not for Residual, and tests based on S,/S, and S;/S, are indicated. But we 
may test of = 0 by the Interactions/Residual test of S,/S; used in (35.65) for Model I. 
We find, for the first time, that we have to distinguish different choices of the divisor 
MS for the F-tests in an AV table. 


Testing hypotheses in Model П AV 

36.16 The remarks in Examples 36.3-4 draw our attention to the fact that we 
have given no theoretical justification so far for the use of F-tests in Model II AV. 
Even where the test coincides with that of Model I, its characteristics will be different 
and we cannot presume optimality; in any case, we have seen that new test statistics 
may be required. We must now consider the theory of Model II tests. 

Remembering that we are specializing to AV situations where the (t—1) quadratic 
forms in (36.26) are all Sums of Squares, we may write the multinormal distribution 
of the observations in the form 

aO- 5 S; 
dG co exp { af EE il (86.34) 
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Now we observe that 2}, defined as 2,—m6?, is not a function of 0 at all, for by 36.12 
it is equal to P, V, P; , i.e. 1? times the sum of all the elements of V,, which are variances 
and covariances, and therefore free of 0. In every case we shall consider, t = k+2, 
the number of parameters to be estimated. Thus we have (k+2) latent roots which 
are (in our case, always linear) functions of (k+1) parameters oj. We can eliminate this 
redundancy by expressing 4; in terms of the other latent roots, so that (36.34) becomes 
_1[_пў#—2пў@ Er 5 

ЈЕ АШ) 099 

Gautschi (1959) has shown the family of distributions 


Хел) exp È setate) (36.36) 
ј=1 


to be complete, a result not covered by (23.19), Vol. 2. We see at once that (36.35) 
is a case of (36.36) with 

„= 5, t= —1/(24), j22; 

һ=ў, тү=пф/(3„,...,); 


(Tas +++ s T7) = —ї@/р(А„...,3,). 
Thus (36.35) is complete with this parametrization. 


and 


36.17 We may now make use of the results of Chapter 23. We are debarred from 
finding UMPU tests by the methods of 23.27-36, since (36.36) is more general than 
(23.73). However, we may find UMP similar tests directly by the method of 23.20. 

In our applications to balanced Model II AV, the latent roots are linear forms in 
the parameters as exemplified in (36.33). "The hypothesis which we wish to test (that 
one particular oj, and that one alone, is zero) is always equivalent to testing that two 
particular latent roots are equal, as can be verified at (36.33) or in the simpler case of 
Example 36.3. 

We first observe that Hy : 2, = 4,; q, р> 1 leaves us with a set of (k+1) complete 
sufficient statistics 

T = (8, Ss «s+ Spry Spay › булу Sq › 5+0 (Sp+S,)}- 
Now we assume that p(A) in (36.35) is a function of 4, or 4, but not of both. Following 
23.20, we see that every similar region for H, СБ of a fraction « of each contour 
of constant 7, which we now hold fixed. We mee 


AQ i) = Aa dos > Wwe; (86.37) 


For fixed T, use of the Neyman-Pearson rivis — on n (36.35) with (36.37) inserted 
shows that the uniformly most powerful size-« critical region for testing H, against 


Hı: 4,>4, is given by 

45,075)» (T). 
so that the UMP critical region for any fixed T is given by small values of S,, whatever 
the parameter values. Since (S, 4- S;) is held fixed by T, this critical region is equivalent 
to large values of the ratio S,/S,. Finally, it is easy to see that this ratio is distributed 
free of the parameters when H, holds, and is therefore by Exercise 23.7 independent 
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of the complete sufficient statistic T. Thus the test based on large values of S,/S, 
is unconditionally UMP similar. 

Thus we have established that the UMP similar test of equality of two Expected 
Mean Squares in a (balanced) Model II AV table, against a one-sided alternative 
hypothesis, is the F-test of the ratio of the (potentially) larger to the smaller of these 
MS, large values leading to rejection of the hypothesis tested. Essentially this result 
was first given by Herbach (1959). Exercise 36.4 is to show that the tests are also 
UMPU. They are not LR tests in general (cf. Exercises 36.5-6). 


Example 36.5 

In Example 36.3, the Expected MS were: 

Groups: 2, = ol-nm;o Residual: A, = o; (36.38) 

Provided that o2>0, as we always assume, the hypothesis that 2, = A, is identical 
with H,:6? = 0, and H,:2,»4, with H,:07>0. Thus the test given in Example 
36.3 is UMP similar by 36.17. It turns out that in this case д] = 2, exactly, but this 
does not affect the above test (see Exercises 36.2-3). However, it implies that to test 
the general mean @ in the Model II balanced one-way classification, its MS must be 
tested against the Groups MS, and not against the Residual MS, as we found for 
Model I in Example 35.1. (This is directly seen from the fact that the p group means 
are independent normal variates with mean 0 and common variance, so that we may 
test 0 by the use of “ Student's " ¢* with (p—1) d.fr.) In Model II, this test for the 
general mean holds whether or not а? = 0, whereas in Example 35.1 the Model I 
test held only when there were no between-group differences. Thus the models differ 
even in this simplest case. 


Example 36.6 

In Example 36.4, (36.33) shows at once that if o; > 0), oj = 0 is equivalent to, = А; 
o% = 0 to 4, = Ay, and ec = 0 to 2, = A, The tests of these hypotheses indicated 
at the end of that Example are UMP similar by 36.17. 

If we first test and accept the hypothesis that ойо = 0 (4, —4;), it is tempting to 
carry out the tests of oj = 0 (or og = 0) by testing 5, (or S;) against the pooled SS 
(S,+,). Evidently the increase in the d.fr. of the denominator of the variance ratio 
can bring an increase in the power of the test, but since the decision whether to pool 
S, with S; depends on the previous test, it may be wrongly taken when 4, # /;. As 
a result, control of the size of the overall test procedure becomes difficult. The numeric- 
ally complicated theory, and recommendations for such pooling procedures, are treated 
by Bozivich et al. (1956) and Srivastava and Bozivich (1962). 

If we can assume оўє= from the outset, these pooled tests of of and её are valid, 
and coincide with the Rows and Columns tests in Model I (Example 35.2) when inter- 
actions are assumed to be zero. 

Because the Interactions MS is the denominator for the tests of row-effects and 
of column-effects, we can make these tests even when every cell frequency m = 1. 
This was not so in Model I (cf. Example 35.3) unless we were able to say that all inter- 
actions were zero. Here, only the test of oz; = 0 is lost when all лу = 1 and the 
Residual SS is identically zero. 
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General balanced cross-classifications 

36.18 The patterning of the expected MS given at (36.33) for the two-way cross- 
classification is suggestive for higher-order balanced cross-classifications. For the 
three-way cross-classification, it will similarly be found that the MS have expectations 
given, in an obviously extended notation, by: 


EMS) 
Rows (R): Ag = ої+ moko, + Imoso-- стоћ, + стоў 
Columns (C): 23 = 02+ moze, + lmohe+rmogr,+rlmog 
Layers (L): А, = OF + mokor + стоў, + rmo, + remot, 
(Rx С): 4, = 02+ moder + Imo 
First-order net А xL): A, = 02 + тоћо, + стоў, (36.39) 
(CxL): 2; = ої+тоҗо„+ттоф, 
Second-order Interactions 
(RxCxL): Ag = 02+ moze, 
Residual: a, = д? 
This corresponds to the model, generalizing (36.28), 
Укр = OF ugs + Uu + Usog + Hijo + Mire Ug + Uige t Eijkp (36.40) 
1021.2, coy аср у клр Фуа yr т) 
with var (Uje) = оф, ... , Var (Wize) = Okos «++ , Var (Lijk) = Shox 


The rule of formation of (36.39) (and (36.33) and (36.38)) is now clear. Any 
expected MS has o? plus m times a linear function of the variances. This linear function 
contains every variance in the model which includes among its subscripts all the identify- 
ing letters of the MS. The coefficient of each such variance is the product of the upper 
limits of the suffixes in (36.40) corresponding to letters not included among the sub- 
scripts of the variance; if all letters are included, the coefficient is unity. Thus, e.g., 
considering the expectation for the MS for the (R x L) Interactions, the only variances 
containing both R and L among their subscripts are oj, and ойс Gk, omits only 
С from its subscripts, and the corresponding suffix in (36.40) is j, with upper limit с; 
о? ст includes all subscripts, and gets the coefficient unity. Thus we obtain (coh, + ohcr), 
to be multiplied by т and added to оў, as given in (36.39). More general rules of 
formation of expectations of MS, including this balanced Model II rule as a special 
case, are given by Cornfield and Tukey (1956) (see also Scheffé (1959)). 


36.19 (36.39) reveals a new feature of the three-way cross-classification, which 
persists for all higher-order cases. In Examples 36.5-6, we saw that each hypothesis 
of interest in the one-way and two-way cases (that some variance was zero) was identical 
with the hypothesis of equality of two expected MS 2,. In (36.39) it will be seen that 
this remains true so far as ohcr, ohc, oz; and obr are concerned: these are respectively 
zero if and only if 2s = 2, Ag = Ag, Ag = Ag Or 2, = Ag. Thus the second-order and 
all first-order interactions сап be tested by UMP similar F-tests, using 36.17. The 
situation is different for the other variances ok, c? and oj. 
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Consider оў, for example, contained in 2,. Н, : оў = 0 cannot be expressed in 
terms of the equality of two A4, Instead we observe that A,+Ag>4s+4_ and that 
Н, is identical with 

Hy: dgtdg = 5+. (36.41) 


'The theory of 36.17 is of no use now, and, so far as we know, no investigation of the 
optimum choice of test of (36.41) has been made. However, an approximate test 
may be based on the result of Exercise 36.7, and a similar approximation is clearly 
possible whenever we can express a hypothesis that a variance is zero equivalently 
as a hypothesis that a linear function of the 2; is zero (see Exercise 36.8). 


Hierarchical classifications 

36.20 The general theory of 36.13-15 and 36.16-17 applies to balanced hierarchical, 
zs well as cross-, classifications. In the balanced case, we may expeditiously make use 
of the fact that a hierarchical classification may be regarded as an incomplete cross- 
classification to obtain the additional column of expected MS required for the AV 
table. 


Example 36.7 Balanced two-way hierarchical classification 
Consider the model 
Xup = OU + Mist Eip (36.42) 
(-12..R; 4-52 weep ts, DPA 

corresponding to k groups at the first level of classification, (the same) / sub-groups 
within each of these, and m observations in each of the kl sub-groups. The AV table 
in Example 35.5 holds here, with the necessary changes in notation,“ and we need 
only derive the expected value of each of its MS (except the general mean). We 
now note that (36.42) is a degenerate form of (36.28), in which we need only put 
оё = Oand relabel u;, asu;. We also write o} and със as oj, оў for our present purposes. 
With these changes, we find that 2, and A, of (36.33) are equal, with a total multiplicity 
of (J—1)+(k—1)(/—1) = А(1—1), the d.fr. for sub-groups. (36.33) now gives: 


EMS) 
Groups: 9; + mo} + Imo} 

Sub-groups: — оў+тої (36.43) 
Residual: о? 


Evidently, UMP similar tests of о? = 0 and of оў = 0 are available from 36.17. 


Example 36.8 Balanced three-way hierarchical classification 


By specializing (36.40) in exactly the same way as in the previous Example, (36.39) 
becomes (the reader is left to do this as Exercise 36.9): 


(*) Notice particularly that Z in Example 35.5 corresponds to kl here, the total number of 
sub-groups, 
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E(MS) 
сор: o2 +-mo3 +1, mo} + ЬЬ то} 
Sub-groups: o2+-mo} +l, тоў 
Sub-sub-groups: 02+ тоз (36.44) 
Residual: аё 


Here, there аге /, groups each containing l, sub-groups, each of which contains Һ 
sub-sub-groups, and т observations in each of the latter, making л = /,/,/,m in all. 
Again there is no difficulty in obtaining tests by the method of 36.17—the problem 
which arose for higher-way cross-classifications in 36.19 was produced by the multi- 
plicity of interactions, which do not enter into the present problem. 
Scheffé (1959) gives an extended treatment of the three-way hierarchical case with 
possibly unequal frequencies. 


Power of tests, confidence intervals and negative estimates 

36.21 Ву 36.17, a UMP similar F-test for the hypothesis that some variance is 
zero is equivalent to testing Hy: ¢ = 0 in 4, = 4,+¢ against H,:¢>0. The ratio 
of S,/(4-- 4) to S,/A, always has a central F-distribution, so that 


Fe 3 ( +t) (36.45) 


This leads immediately to the power of the test of Н, based on the statistic S,/S,, 
Exercise 36.1 being the simplest case. 


36.22 Whereas in Model I we were led to consider multiple comparisons between 
the parameters (means) of the model, the natural next step in Model II is to consider 
confidence intervals for the variance parameters. 

(36.45) leads immediately to confidence intervals for the parameter $/A,, of which 
Exercise 36.11 is the simplest case. These intervals may cover (or even consist entirely 
of) negative values, as even that simplest case shows. ‘This runs parallel to the possible 
negativeness of the point estimators of variances which we found in Examples 36.34. 
For practical purposes, a negative point estimate of a non-negative quantity is inadmiss- 
ible, and is therefore usually replaced by the estimate zero. Although this removes the 
unbiassedness of the estimator, it may reduce mean-square-error (cf. Example 36.3). 
Similarly, the negative portion of a confidence interval is usually replaced by the value 
zero. 

"Thompson (1962) gives an algorithm for obtaining non-negative estimates of vari- 
ances which gives intuitively acceptable results in the one-way and two-way cross- 
classifications, but becomes complicated even in the three-way case. 

Bulmer (1957)—see also Scheffé (1959)—obtains approximate confidence intervals 
for ¢ itself in (36.45), rather than the less useful intervals for ¢/A, already mentioned 
(cf. Exercise 36.15). 

"There is no difficulty in constructing confidence intervals for the error variance o? 


from the distribution of the Residual SS in every case; these are never negative. 
Li 
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The unbalanced case in Model II 

36.23 Since Example 36.2, we have been confining ourselves to the balanced 
(equal frequencies) case. To exemplify the difficulties of the unbalanced case in 
Model II, and throw some light on their origin, we now return to the one-way classifica- 
tion, for which the model is given in Example 36.1. 


36.24 We see that 
COV (Yip Irs) = 0 ifizr 
mot ifi=r, 
so that the multinormality assumption implies that observations which are not in the 
same one of the p groups are independent, while those within the same group have 
covariance oj; all have zero means and variances oj+o?. (This is the simplest case 
of the general formula (36.11).) 

To write down the exponent of the multinormal distribution, we must invert V, 
which, as we have just seen, contains only zeros apart from р square submatrices 
along its leading diagonals, of dimensions т, mg, . . . , пр (the numbers of observations 
in the p groups), each of which is of form 


OT +O Os . 


di. J Б 
Num] ENT 
P. NOM aen. +o 
with inverse o2+(n,—1)o3, -03 . Е . -oF 
Vat = (ато) i ce. 770. Б 
: є әй *-o? 


=P Rays веке БУО; o. (n;— 1)ot 
which may be verified by multiplication. Thus Уу! contains the У„! along its leading 
diagonal, and zeros otherwise. 


36.25 The m of the v for the one-way classification is, from 36.24, 
2 oè 2m 
log L = с(ої,оў)— 24 a{? dite =t 


Li Ж 'ia— 0)? 
hemo 20а ) 
-É ai aiio«-004-0) 


(36.46) 


2 


= qoi. c Tnm х 204-9} 


= oh o) - (ахои). 07-х x аон —O 
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Since y, = E 1; y;./n, itis obvious from (36.46) that the (p+ 1) statistics 50а Жу 


3,1171, yx - - » р} are sufficient for the three parameters of the problem. If the 
n, are all equal, the last summation on the right of (36.46) may be written 
noi noi 
X Et = — ый —y ЇЗ — 6\? 
gI = — „че! Ети уучу. 09), (3647) 
and (36.47) inserted into (36.46) shows that we then have the minimal sufficient set of 
three statistics E E (ув-—у.)%, yy z n;(y;.—y.)*}, in accordance with the general 


result of 36.11 оя ‘the balanced [m "These are, of course, essentially the quantities 
entering into the AV table discussed in Example 36.3. 

However, if the л; are not all equal, (36.47) does not hold, and the minimal sufficient 
statistic has more than three components (cf. Exercise 36.12). 


36.26 The AV is severely affected by the lack of balance. The “between groups” SS 


$, = & n;(y;.—y..)® is no longer distributed аз a multiple of a chi-squared variable, 
d-1 


for it is a weighted sum of squares about their mean of normal variables with zero 
means but unequal variances. However, the distribution of the Residual SS S, is 
unchanged, so that 

E(Sy/(n-p) = о (36.48) 
as before. 

One can still estimate о? from the AV table, but this is no longer a unique optimum 
procedure, as it was in the balanced case of Example 36.3. We saw in 36.25 that 5; 
and the p group means are always sufficient statistics. Consider the function of the 
group means 


2 р 
Som) = 2 mot-(E m), 
where the m; are constants. Since, from Example 36.3, 
var y, = oj--o2/ni (36.49) 


11 En 
we see that E{S(m,)} = ex m, (et) +0 без) Ут. (36.50) 
i 


(36.50) can be solved with (36.48) to give an unbiassed estimator of оў, whatever the 
m, used, and we thus obtain a multiplicity of unbiassed estimators of oj (except in 
the balanced case (cf. Exercise 36.13)). The “ natural " choice m; = n;, which reduces 
S(m;) to Sa, is convenient but not optimum. 

'This lack of uniqueness demonstrates that (as usual when the dimension of the 
vector of sufficient statistics exceeds the number of parameters) we have lost the com- 
pleteness of the sufficient statistic in the unbalanced case. 

Spjotvoll (1967) shows that the test of о? = 0 using S/S, is almost equivalent, against 
large alternative values of oj, to the most powerful invariant test, the exact form of which 
depends on the value of the alternative considered. He also shows that the analogous 
correspondence exists with Wald's (1940) confidence interval for 02/02 (cf. Exercise 


36.16) which was generalized by Wald (1941, 1947) to more complicated unbalanced 
models. 
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P 
For local alternatives to 0? = 0, Mostafa (1967) shows that the statistic X nî (yi — у.) is 
i=1 
considerably more powerful than Sz. 


36.27 Tukey (1956-7) considered the problem of optimum estimation in the 
unbalanced one-way classification, with complicated results which should be consulted. 
Searle (1958) and Low (1964) investigated the unbalanced two-way cross-classification, 
where again the SS in the AV table are no longer chi-squared multiples. Henderson 
(1953) considered several methods of unbiassed estimation in the general unbalanced 
case; Harville (1967a) investigated two of them for the two-way classification, also obtain- 
ing necessary and sufficient conditions for unbiassed estimation to be possible. 
Generalization of AV models: discussion 
36.28 Both the LS general linear Model I of Chapter 35 and Model II of the 
present chapter are “ extreme ” models, in the sense that all the elements of the vector Ө 
in (19.8) are constants (parameters) while all the elements of u in (36.1) are random 
variables, uncorrelated with each other and with the error-vector e. In practice, we 
may meet situations where a mixture of the two models is apposite, і.е. where 


у = 10,+Х,0+Х,0+є, (36.51) 
where 0, is now used for the general mean to distinguish it from the parameter-vector 6 
of constants. 


36.29 If we confine ourselves from the outset to discussion of AV situations (which 
we did not do until Chapter 35 for Model I and until 36.13 for Model II), we can 
easily see how “ mixed” models of the form (36.51) may arise. Consider for illus- 
tration a two-way cross-classification. 

Suppose that an experiment is to investigate the breaking strengths of five different 
types of paper when wet, and that three different levels of moisture content are to be 
used with each type of paper. This will give a table with five rows and three columns 
for the results of the tests. The five types of paper have been selected for the experi- 
ment because they are of intrinsic interest—we want to know how these papers behave. 
It is therefore reasonable to regard their breaking strengths as constants (parameters) 
subject to the usual experimental errors of determination. So far as the row- 
classification is concerned, Model I is quite appropriate. 

The situation may be different for the column-classification. The three levels of 
moisture-content will probably have been chosen as convenient levels to represent 
“high, " “ medium " and “low” content, there being nothing sacrosanct about the 
precise levels used. In this sense, the three levels used will have been chosen from a 
population of potential levels of moisture-content in some (not necessarily probabilistic) 
way. Thus the column-effects will have some kind of distribution, quite apart from 
experimental error. So far as the column-classification is concerned, therefore, 
Model II (without the normality assumption) is a more reasonable idealization of the 
experiment, though by no means a perfect one. We are therefore led in the first 
instance to represent this experiment by a model of the form (36.51), with Ө standing 
for row-effects and u for column-effects. 


36.30 We may, further, consider the interactions of rows and columns in the 
experiment of 36.29, i.e., as in 35.18, allow for the possibility that the five types of 
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paper have different relative patterns of breaking strengths at the different levels of 
moisture-content. Since the column-effects are themselves idealized as random 
variables, it seems logically necessary that their interactions with the rows must also 
be treated in this way, rather than as constants. We easily achieve this by allowing u 
in (36.51) to have further components to represent the interactions. However, this 
leads us immediately to consider a point which also applies to Model II itself: if we 
allow that column-effects and row-column interactions are random variables, what 
justification can there be for assuming that they are uncorrelated, especially when we 
recall that the introduction of the normality assumption into the model then implies 
independence? 

On grounds of realism, independence of random effects and interactions can hardly 
be allowed, and Model II, while retaining its mathematical interest, seems unlikely to 
do full justice to many practical situations involving interactions. 


36.31 The fact that Model II AV is identified with the generally unacceptable 
assumption of independent interactions, as we shall now call it, stimulated the develop- 
ment of other more general AV models which could be freed from this assumption. 
We shall see that models with tied interactions, i.e. interactions correlated with the 
corresponding main effects, do indeed lead to a different AV procedure from that of 
either Model I or Model II. Recent expositions, with some historical detail, are given 
by Plackett (1960) and Scheffé (1956b). 

We now examine AV models developed by Cornfield and Tukey (1956) (following 
earlier unpublished work by these authors), Scheffé (1956a), and by Wilk and Kemp- 
thorne (1955-6). We confine our discussion to the two-way cross-classification, which 
exhibits all the important features of the models. 


A general model 

36.32 Suppose that we have a population (discrete or continuous) of possible 
levels for the row-classification, from which r levels are selected for use іп an experi- 
ment, and call this population Рр. Similarly, suppose that there is a population Po 
of possible levels for the column-classification, from which the c levels used are selected. 
The selection process for rows is assumed independent of that for columns. If row- 
level i and column-level j are selected, э, observations are to be made on this combina- 
tion. The pth observation in the ith row and jth column will be written y;; as before, 
and we let 

Jüp = Шур+ едр (36.52) 

where E(é;;,) = 0. We leave aside for the moment the detailed consideration of how 
the л required experimental units аге to be allocated to the (i, j)th selected row-column 
combination (see 36.36 and 36.39 below), but we let N;; denote the number of experi- 
mental units which could (by the structure of the experiment) so be allocated, and 
define д, to be the average value of и, calculated over its N;; possible values. Further, 
Hi» and Heja are averages of Hj, over all the levels in Po and all the levels in Pp respec- 
tively, while y,,, is an average of и, over both Pp and Р. No assumption of normality 
or homoscedasticity is made. 


76 THE ADVANCED THEORY OF STATISTICS 


36.33 The general mean, row-effects, column-effects and row-column interactions 
are now respectively defined, by analogy with (35.31), as 


Oss = Mave 
Din = Hive — Hose 
0. = Majo Hove (36.53) 


Oig = Hije — Hive — Hage + Have 
= Hije— bi —0.,— 0... 


Although only r values of i and c values of j will be observed, all our definitions of the 
Ó's are made in terms of the yv’s, which are functions of the populations of levels, rather 
than only those actually appearing in the experiment as it is carried out. Further, 
the буу are tied interactions, related to the population effects 0;, and 6,; through the 


last line of (36.53). 
From (36.53) we may now define the components of variation. If Р, has R members 
and Py has C members, we write 


&= È @ДВ-1), 
i=l 
= E 6,/(C—1), (86.54) 


R C 
eic = È E R-IXC- 9) 


where R or C or both may be allowed to tend to infinity, e.g. in the continuous case. 
'The residual variance is defined by 
eat & £u) y (36.55) 
ы жәл 9—1) tay ун s E 


36.34 We now specialize the general model in various ways. 


Case 1 
Pp has R = ғ members only, and P; has C = с members only, and no sampling 
of row- and column-levels takes place; N;; —> оо for all i, j. 
This is the basis for Model I AV in Chapter 35. (36. 53) becomes identical with 


(35.31). 


Case 2 

Both Рр and Pg are continuous populations, so that R and C — co, as does Nj; 

for all i,j. 
"This is the basis for Model II AV in this chapter, although we have here made 
no assumption of normality or homoscedasticity. The last equation of (36.53) gives 
Hije = Dou +Dje+ O05 + Oi, (36.56) 
and all the terms on the right except 0., are random variables. Тһе interactions б; 
are now effectively made independent of the 0;, and б„, by the infiniteness of the 
population from which they are sampled. In fact, (36.56) is equivalent, apart from 
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changes of notation, to (36.28) (written for y;; instead of the individual observa- 
tions уу). 


Case 3 

The observed r row-levels are selected at random without replacement from the R 
levels in Pp, and the observed с column-levels are, independently of the row selections, 
similarly selected from the C levels in Po. 

'This is a model in which both classifications have random effects, as in Model II, 
but with tied interactions. It reduces to Case 2 when R, C and №, — о. 

It should be noted that even when r — R, c — C and Ny ©, “Case 3 is not 
identical with Case 1 above, since there was no sampling at all in Case 1, whereas 
here the sampling process determines the order in which the r rows and с columns are 
labelled 1, 2, . . . , r and 1, 2, . . . , c respectively. The sampling process effects 
a randomization of the rows and (independently) of the columns, of the r x c table. 
We shall call this Case 3(1). 


Mixed models 
36.35 Case 4 

In Case 3 above, suppose that R = r, so that there is only a permutation of rows, 
which are otherwise fixed, but that, as in Case 2, C and №, — оо. 

This is a mixed model, with row-levels fixed apart from permutations but the 
column-levels selected from an infinite population. The column-vector {дуу}, with 
r components obtained by giving i the values 1, 2, . . . , r, is a random vector because 
of the selection of columns. For different j, the {u;;.} are assumed mutually inde- 
pendent with the same multivariate distribution. However, this model is not as 
general as might at first appear, for the row-permutation process implies that the 
variances of the elements of {дш} are all equal, and similarly that their 1r(r— 1) 
covariances are equal. This condition of complete symmetry is clearly not always 
desirable in practice. We therefore consider a further generalization, due to Scheffé 
(19562), which we shall call Case 5. 


Case б 
R = r asin Case 1, with no sampling, С —» co, and as in Case 4, the {д} are identi- 
cally distributed random vectors. The dispersion matrix of their r components no 
longer has complete symmetry necessarily imposed upon it by a row-permutation 
process as in Case 4. "This is therefore a more general mixed model than that of 
Case 4. 


Imhof (1960) generalizes Case 5 to the balanced three-way classification with two 
random (and one fixed) classification variables. 


36.36 Cornfield and Tukey (1956) give a detailed treatment of Case 3 of 36.34 
(of which Cases 2 and 4 are specializations) not only for the two-way cross-classification 
to which we are confining ourselves, but for general balanced classifications. In all 
cases, they assume that for each selected row-column combination the лу observations 
are made upon experimental units selected at random without replacement from a 
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distinct population of N;; experimental units, and that these rc separate populations 
are all sampled independently of the row- and column-levels selections already dis- 
cussed. For the balanced two-way cross-classification with all n; = m and also all 
Му = N, their results for the expected MS in the AV table are given in the first two 
columns of (36.57). The remaining three columns specialize these results. 


Expected values of MS in AV table 


Special cases of Case 3 


Case 3 
S Case 3(1) Case 2 Case 4 
iv.) m: | IET cte Tod 3 
Row: 1-5 | 
sis EON oikemo | 9 +толо о? + mobo 
с | +стой omo 
+ты{1-@ demos | 
С, | 
= т 
Columns os ( i) |  ei4rmoo of + тоћс а? + ғтсё 
x | + таб 
+ толо 1— 7, )+rmog | | 
R | | 
„—————— — Әи 
Interactions о? ( - 5) +то | 9 + тоћо | ої+то% с? + тоћс 
eraser eee a PNIS 
Residual о? | оё | о? о? 


(36.57) 


It will be seen that the entries in the Case 2 column are identical with (36.33) in 
Example 36.4, i.e. with the Model II results, as stated under Case 2 in 36.34. 

On the other hand, despite the distinction made between Case 3(1) and Case 1 in 
36.34, the entries in the Case 3(1) column in (36.57) are identical with the Model I 
results in Example 35.2 at (35.65), when we recall that a non-central chi-squared variate 
has mean value equal to the sum of its non-central parameter and its d.fr. and use the 
definitions (36.54) with R — r, С = c. The randomization of rows and columns in 
Case 3(1) does not affect the expected MS because the SS are symmetric in row-levels 
and in column-levels. 


36.37 The entries under Case 4 in (36.57) are new to us, and we see that the 
expected MS for Rows (which are fixed apart from permutation) is identical with its 
value in Model II (where row-effects are random), while the expected MS for Columns 
(which are random here) is identical with its value in Model I (where column-effects 
are fixed). In fact, the expected MS for Rows is determined by the sampling in the 
columns classification, and vice versa. The Rows MS is concerned with differences 
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between row-levels, and the whole population of these is observed, but in association 
with only a sample of column-levels; the population interaction between row- and 
column-levels is therefore relevant, and o5, z appears in the expected MS. "The observed 
sample of column-levels, however, is associated with every possible row-level, so the 
Columns MS does not depend on that population interaction, and o; does not enter 
into the expectation of this MS. The phenomenon is general in more complex classi- 
fications—the sampling of levels of the other classifications than the one under con- 
sideration determines the structure of the latter's expected MS. 


36.38 "The expected MS in Case 5 are the same as those given for Case 4 in (36.57). 
As in the Case 3(1) discussion at the end of 36.36, this is because the SS are symmetric 
functions of the rows. However, an important distinction between Cases 4 and 5 
arises as soon as we introduce the multinormality assumption upon the (u;,). It 
then appears (see Scheffé (1956a, 1959) for a detailed discussion) that the mixed model 
of Case 5 does not retain the simplicity of Model II, where we found (cf. 36.15-17) 
that the ratio of each SS in the AV table to its expected MS has a central chi-squared 
distribution, and as a consequence were able to obtain UMP similar F-tests by testing 
each MS against another with the same expectation on the hypothesis. It remains 
true in Case 5, as the last column of (36.57) suggests, that (if m>1) ойс = 0 may be 
tested by an F-test on Interactions MS/Residual MS, and oj, = 0 may be tested 
similarly. But the statistic Rows MS/Residual MS does not in general have an F- 
distribution, even though its numerator and denominator are independent with the 
same expectations when oj = 0. Scheffé (1956a, 1959) gives an alternative test 
statistic distributed in the Hotelling’s T? form to be discussed generally in 41.15-17. 
In Case 4, however, the statistic Rows MS/Residual MS remains distributed in the 
variance-ratio form—see Imhof (1962) who finds that the effects of using the inappro- 
priate variance-ratio test in Case 5 can be considerable, and suggests an alternative 
F-test. A similar investigation was carried out for a generalization of Case 5, appro- 
priate to the analysis of a group of experiments, by Calinski (1966). 


S. N. Roy and Cobb (1960) consider the mixed model (no interactions) with normal 
errors and one or more random effects which are non-normally distributed. Hartley 
and Rao (1967) give a ML procedure for the general unbalanced mixed model with 
normal errors. 

We mention briefly that some sequential AV procedures have been developed by 
D. R. Cox (1952), Johnson (1953-4) and В. К. Ghosh (1964, 1967), whose papers should 
be consulted for further references in this field. 


Allocation of experimental units: randomization 

36.39 Despite the complications into which the proliferation of AV models has 
led us, we have not even begun to consider an important source of variability in most 
experimental data. 

In 36.32 we left aside the question of the allocation of experimental units to the 
various row-column combinations to be used, and we saw in 36.36 that the general 
model there designated as Case 3 assumed that the лу units allocated to a selected 
row-column combination come from a distinct population of N;; units, so that there are 
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rc populations of experimental units. This is an extreme situation—the populations 
of units do not overlap at all. 

At the other extreme is the situation where all the experimental units to be used 
(e.g. rem in the balanced two-way classification) are selected at random without replace- 
ment from a single population of N units. Here N;; = N for all i, j, and there is 
complete overlap, so to speak. This method of allocation is called complete randomiza- 
tion, and any experiment employing it is a completely randomized experiment. 

There are also obviously intermediate situations of partial overlap, where groups 
of row-column combinations share the same population of experimental units, and 
there are more than one but less than rc such populations. These would still be called 
randomized allocations (though not “ completely randomized"). For example, each 
row (or each column) of the experiment may have its own population of experimental 
units—this is the case when the allocation is in randomized blocks, to which we shall 
return in Chapter 38. 


36.40 The reader will perhaps wonder why, in 36.39, the term “ randomized ” 
is denied only to the separate-populations method of allocation, for, after all, each of 
these populations is sampled at random. Like most confusing usage, this can be 
understood from the early history of the subject. The early formulations of AV did 
not explicitly allow for sampling of the separate-populations kind—(36.57) indeed 
shows that so long as each population is large, its size exercises little influence on the 
Case 3 AV table. On the other hand, questions of efficiency in experiment design 
(to which we shall turn in Chapter 38) forced early consideration of randomized blocks 
and similar methods of allocating experimental units, and here the explicit and essential 
randomization procedure became eponymous. 


36.41 Randomization of experimental units should clearly be taken into account 
in the analysis of the data, but this leads to considerable complications, and we need 
some new definitions. 

Explicitly, we wish to allow for the possibility that иу, defined at (36.52) as the 
mean value of the pth observation on the (i,j)th row-column combination, may itself 
depend upon the characteristics of the particular experimental unit upon which the 
pth observation is made, as well as upon (i,j). Consider any group of row-column 
combinations which share the same population of experimental units, as discussed in 
36.39, and suppose there are m members of this group, and that their population con- 
tains M experimental units, M>m. We now make a formal two-way classification 
of group-members against units. We emphasize that (i,j) is now being treated as a 
single classification by bracketing these suffixes on the right of the identity 

Шур = Moyet tape — Hore} + Mom My) + {anp Hee ор Her} (36.58) 
= Meet top Ker} + tipp Han — Hern Ie; (36.59) 
where asterisks denote averaging as before. 

(36.58) resolves jj, into a “ general mean,” two “ main effects,” and an “ inter- 
action.” If both terms in braces in (36.59) are identically zero, the allocation of 
experimental units to row-column combinations is irrelevant to jp, for then it equals 


“ 
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its average over all units. The first term in braces in (36.59) is called the unit error 
of the experimental unit concerned. ‘The second term in braces there will be called 
the interactive error for the experimental unit and row-column combination concerned. 
(More usually it is called the unit-treatment interaction.) 


36.42 When the resolution of шу, into three components at (36.59) is superposed 
on the underlying two-way classification scheme set out in 36.32-3, the model becomes 
complicated. The mere fact that the interactive error term in (36.59) carries suffixes 
i, j and р, as does also ¢,;, in (36.52) (which we now call the technical error, to distinguish 
it from the unit and interactive errors defined above), leads us to expect difficulties in 
estimation of the various components of the model. 

It is worth emphasizing that the technical error alone is an error term in the usual 
sense, arising from inaccurate measurement or observation. The unit and interactive 
errors arise purely from the allocation of experimental units to the row-column com- 
binations. 


36.43 Wilk and Kempthorne (1955-6) discuss the one-, two- and three-way cross- 
classification in Case 3 of 36.34, including the unbalanced case, when there is complete 
randomization of experimental units. For the case of proportional frequencies, an 
orthogonal AV is always possible (cf. 35.21-2). For general (non-proportional) fre- 
quencies, a non-orthogonal AV using unweighted means of levels is used (cf. 35.31). 
The difficulties anticipated at the end of 36.42 duly arise—only certain functions of 
the parameters can be estimated. Moreover, as Plackett (1960) remarks, it is difficult 
to regard unequal frequencies л; as fixed when row- and column-levels are being 
sampled. The addition of another sampling process to determine the л; (which 
might also be correlated with the observed values of y) complicates the analysis further. 
Harville (1967b) allows the frequencies to be correlated with the effects in the un- 
balanced one-way classification in Model II, and investigates the resulting bias in 
the estimation of oj—cf. 36.23-7 above. 


36.44 We shall defer further discussion of randomization models until Chapter 38, 
where we shall examine their rationale. We have examined their effects on AV pro- 
cedures sufficiently closely for our present purpose, and we conclude this chapter with 
some discussion of the implications of its contents. 


The choice of an AV model 

36.45 "The plethora of models now available for AV presents the applied statistician 
with a problem which, in less acute forms, arises in the use of statistical techniques 
generally. Evidently, careful analysis of the known facts concerning the origins of the 
observations must be undertaken before a model can be chosen which reasonably 
represents the real-life situation; and where there is little such knowledge, a good deal 
of guesswork may be necessary. In this respect, the statistician is experiencing a 
situation familiar in almost every field of applied science. 

It is worth pointing out that the varieties of AV discussed in 36.32-8 differ in their 
assumptions about the methods of selection of the levels of the factors being analysed, 
and not in any assumptions about the real nature of these factors or of the variables 


82 THE ADVANCED THEORY OF STATISTICS 


underlying them. Provided that the data arise from a designed experiment, the 
assumptions are concerned with the behaviour of the experimentalist rather than that 
of his material. On the other hand, the complications of 36.39-43 are essentially 
concerned with the nature of the material being experimented on. It must always 
be a matter for the experimenter to judge whether his experimental units differ enough 
to make these added complications in the analysis worth while—in social, agricultural 
and biological work they sometimes do, and in physical and industrial experimentation 
they often do not. Our present point is that, even when the observations arise by 
deliberate design, there is ample scope for intuitive skill in such judgements. A fortiori, 
when the observations are not the result of a designed experiment, the validity of the 
chosen analysis will depend on the insight of the statistician. 


36.46 It will be clear, then, that AV, like other statistical techniques, is not a mill 
which will grind out results automatically without care or forethought on the part of 
the statistician. It is, rather, an assortment of delicate instruments which can be 
brought into use when appropriate. It requires skill, as well as hard work, in use. 
Elaborate techniques need not be (though they sometimes have been) applied to prove 
something which was almost obvious to inspection from the start—the statistician must 
never lose sight of the need to scrutinize his material. Equally, inappropriate analyses 
have often been made. AV has no monopoly of the misapplications of statistics, but 
the multiplicity of models now available makes it particularly vulnerable to the errors 
of the single-minded enthusiast. 


36.47 In the next chapter, the last of our three concerned with AV techniques, 
we first investigate the problem of transforming the data so that AV may be used. 
This leads to a discussion of the robustness of AV and of distribution-free methods in 
this field. Finally, we shall there consider the difficulties produced by incomplete data. 


EXERCISES 


36.1 In Example 36.3, show that the power function of the size-x test of Н,:0? = 0 


against H,:01»0 is 
9X -1 
P(o) = i-e (+) Ga }. 
H 


where G is the distribution function of the central F-distribution with appropriate degrees of 
freedom and С, the value exceeded with probability х, so that G{G,} = 1—«. Show that 
Р(0?) is monotonic in its argument, that the test is unbiassed, and that it is consistent as group 
size n; increases to infinity, but not if the number of groups р alone — co. 


36.2 In 36.17, show that since ӯ is a component of Т, the term exp (— 4nj?/p(A)} in (36.35) 
will not affect the UMP similar test of Н, against H, even if p(A) is a constant multiple of 2р 
or Ag. 


36.3 In (36.26), show that the sum of all n latent roots, 2; + x A;nj, equals the sum of 


the variances of the л observations. Hence show in Example 36.5 hat Al = dy. 


36.4 Show that the UMP similar one-sided F-tests of 36.17 are unbiassed, and are thus 
UMPU size-« tests. 
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36.5 For the balanced one-way classification in Examples 36.3 and 36.5, show that the 
ML estimators of 4; and 23 are 


4, = Sj, А = SJ(m-1) if 27 


М = hy = {+рфи-1)}/п ifi. 
Hence show that the LR test statistic Z for Hy: 4: = A, is given by 


в = Bum if >í, 


but become 


a 
- 1 if 4 «4, 
so that / = 1 whenever the test statistic F< 23 and the LR test is not equivalent to the F-test. 


Show further that for p» lis a monotone decreasing function of F. Verify that the 


critical values F,( p — 1, p(m—1)} always exceed eem for х<0:25, so that for all practical purposes 


the LR test is equivalent to the UMP similar F-test. (Herbach, 1959) 


36.6 For the balanced two-way cross-classification of Examples 36.4 and 36.6, show that 
the LR test of Ну: cz = 0 is a function of Sẹ S, and S,, whereas the UMP similar F-test is 
a function of S,/S, only, and thus that the tests are not equivalent. (Herbach, 1959) 


36.7 If Sj/A; are independently distributed у? variables with v; d.fr,, and 4 = > сз, 
show that S = X (c; j/vj) has its first two moments identical with those it would have if »S/A 


were also а у? variable with d.fr. 
„ш AE ($25 /v9, 


estimated by 
»* = S!/Z(d 57/0). 
ј 
Hence show in 36.19 that an approximate F-test of (36.41) may be based оп the ratio 
(MS)n/((MS)nc + (MS)rz—(MS)acz}, 

distributed in the variance-ratio form, when (36.41) holds, with (r—1) and »* d.fr., where each 
cj = 1 and (MS); = Sj/vj. 

(Satterthwaite (1941) and Box (1954) verify the 

approximation numerically when the c; are positive.) 


36.8 In 36.18-19, show that in the general r-way balanced cross-classification, only variances 
with r and (у — 1) subscripts can be tested by the UMP similar test of 36.17, but that approximate 
tests based on Exercise 36.7 can be made for all other variances. 


36.9 Verify (36.39) and also (36.44) in Example 36.8. 


36.10 In Example 36.3, show that the variance of the unbiassed estimator of oj is given by 


TaT 202) , (0-0 ) 
а(н) сун“ 


and hence that the estimator is consistent аз p —> 00, but not if n alone —> œ. (СЁ. Exercise 
36.1, where test consistency requires that m; —> 00.) 
(Tukey (1956-7) gives general expressions for 
variances and covariances of estimators of variances.) 
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36.11 In Example 36.3, obtain confidence intervals for 02 and for 02/02 from the distributions 
of S, and S,/S, respectively. Show that the latter intervals may be partly or wholly below 
the value zero. 

36.12 In 36.25, show that the minimal sufficient statistic for the three parameters has 


р, 
s=p+1— X (r—2)d, components, where dr is the number of r-tuples of common values 
т=з 
р, 
among the р values m. Show also that s = 1+4,+2 È d;. 
r-2 


36.13 Prove (36.50). Show that if and only if the л; are all equal, its solution with (36.48) 
gives a unique estimator of oj. 
36.14 If we denote by zg, independent 7* variables with 2g; d.fr., where the g; are positive 


r 
integers, show by resolving the c.f. of X сулу, into partial fractions that 
=1 


т r г 
prob { У ani) = > X «ув Prob {zs > x/c;). 
jai j=1 s=1 


Writing 1—2itc = y in the c.f., show that the constants on the right are given by 
б, on = fe (0)/һ! 
where 
r 
^0) “л, 714/24). 
jk 
(Box, 1954) 


36.15 If S,/($--2) and S,/A are independent у? variables with fı, f, d.fr. respectively, 
Mi = Si/ft, Е = М,/М, (the variance-ratio statistic), and Е, у, is the 100x per cent point 
of the distribution function of F with (fı, f;) d.fr., show that if 


P{M,g(F) <$} = a 
for a monotone increasing function g(F), it should satisfy the conditions 
g (Fag) = 0, 
Е(Е) ~ F/F, œ аз F—> со. 
Show that these аге satisfied by 


mU) = (Е-Е, 4.)/Fa, о 
gi) = (ЕЈЕ,, «)—1 + (Fa, у,/Е) {1 — (Fa, f,/ Fa, 0)}- 


(Bulmer (1957) showed that g, is a poor, and g, a re- 
markably good, approximation—see also Scheffé (1959).) 


өлү 
u=(a ta) - 


Show that а 100(1—4)95 confidence interval for 02/02 is given by 


Euy. \* 
n-p & mink 
Gita SET PE e Sa [s < Giu 


and by 


36.16 In 36.23-6, write 


where G, is defined in Exercise 36.1. 
(Wald (1940); cf. also Spjotvoll (1967).) 


СНАРТЕК 37 
THE ASSUMPTIONS OF THE ANALYSIS OF VARIANCE 


37.1 When it has been decided that a particular model is appropriate to a given 
situation, an important problem remains for consideration. Although natural con- 
siderations of convenience or technique may dictate that the observations be made on 
a variable y, it still has to be decided which function of y is to be used for the purpose 
of the analysis. There is no reason why the quantity measured, rather than some 
function of it, should be best suited to the assumptions of the model. 

There may, indeed, be compelling practical reasons for the consideration of a par- 
ticular function of y, say g(y) (which may simply be y itself): for example, g(y) may 
be closely related to the cost or the profitability of a process under investigation. But 
this implies only that the conclusions of the analysis should finally, for practical pur- 
poses, be expressed in terms of g(y); it certainly does not justify the presumption 
that the model is better satisfied by g(y) than by any other function of у. 


37.2 Putting the problem slightly more formally, we may say that a set of observa- 
tions on y are, equally, a set of “ observations” on any well-defined function g(y). 
The question is, which “ observations ” g(y) shall we use? Evidently, we must try to 
determine the function which as nearly as possible satisfies the model. The search 
for a transformation of this kind was first treated generally by Box and Cox (1964), 
whose investigation is generally applicable to the linear model (with normal errors) 
of which the AV Model I in Chapter 35 is a specialization. The reader will observe 
that the preceding introductory paragraphs are not restricted to the AV context, for 
the problem is a general one. 


Transformations to the normal linear model 

37.3 Following Box and Cox (1964), suppose that we observe a dependent variable 
у and a set of regressor variables x;, x2, . . . , xp; and that we wish to employ the linear 
model with normal errors. However, we are not prepared uncritically to assume that 
we may validly write 


у = Х0+є; 
rather, we seek transformations both of у and of each of the x’s so that we have 
y, = Х,0+є, (37.1) 


where the components of € are independently normal with zero means and constant 

variance 0°. In (37.1), A = (A;, 4a, . . . ) indexes the transformation of y within some 

selected parametric family of transformations, and similarly р. = (иу, из, . . . , ug) indexes 

the (separate) transformations of the regressors ху, Xa, ..., x, We are thus general- 

izing our introductory discussion, where only transformation of y was envisaged. 
85 
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37.4 By (37.1), the LF is, in logarithmic form, 


log Laa (y [0,0°) = —ànlog (2ло? = (-X,9'(y.-X,9)-log], (37.2) 


where J, is the Jacobian of the inverse transformation from у» (the normally distributed 
variable in (37.1)) to the actually observed у. Now, when the LF (37.2) is maximized 
for given А, ы, with respect to Ө and o*, we find as in 24.28, Vol. 2, that the middle 
term becomes a constant. If we neglect constants, therefore, we have the conditional 
maximum for fixed A, p, 


log (у | 6, 62) = —4n log 6%, + log Ja, (37.3) 


where 165, = у Туу, say, is the Residual SS, again as in 24.28. 

We now need to compute the absolute maximum of the conditional maxima (37.3) 
over the whole range of A, р. This is, even with the aid of electronic computing, a 
formidable numerical task, except when only one or two transformation indices are 
involved, e.g. when 


(a) only the dependent variable y is transformed and А has only one or two com- 
ponents; or 

(b) the same transformation is applied to all of, or a subset of, the regressors, so that 
р. has only one or two components; ог 

(с) А has a single component as in (a), and (b) holds with only one component in p. 


In cases (b) and (c), numerical plotting of the contours of (37.2) for all А, р. will 
generally be necessary. We now confine ourselves to case (a), where only the dependent 
variable is being transformed. In AV problems, where the regressors are 0-1 variables 
(cf. 35.9-10) since we are dealing with classified data, this is not a restriction of conse- 
quence. In more general regression studies, it implies that we can choose proper forms 
for the regressor variables before considering transformation of the dependent variable. 
Box and Tidwell (1962) discuss transformations of the regressors to simpler form 
(cf. Exercise 37.9); such transformations do not, of course, affect the normality or 
homoscedasticity of the errors. 


37.5 Returning, therefore, to the purpose outlined in our initial discussion, we 
consider transformations of y alone. In practice, the most useful transformations have 
been found to be the powers and the logarithm of у, possibly translated by a constant. 
We therefore consider the family of transformations 


Ya = A), A70 
—1ор(у+А), М = 0. SM 
To avoid a discontinuity at 4, — 0, we rewrite this equivalently as 
э» = (у+М)*—1}//„ Аж 
37.5 
= log (у+д), ded. em 
Tukey (1957b) studied and charted the structural features of the family (37.4) for 4; € 1, 


and Dolby (1963) considered properties of the differential equation which it satisfies, 
namely 


(ох)? = 071). 
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Healy and Taylor (1962) give tables to facilitate fractional power transformations when 
2, = 0 and 2, is a multiple of 0:2. 


37.6 In (37.3), we now have b 
log J, = (4,—1) z log (у; +з), (37.6) 


and (37.3) can be plotted for selected (21, 2) for numerical determination of the absolute 
maximum. An AV must be carried out for each (24, 45) used, to obtain the Residual SS 
in (37.3). In the simplest case when 2, = 0, this can be avoided by equating to zero 
the first derivative of (37.3) with respect to /,. Using (37.5-6), this gives 
_ Plog La (918,63) — ,y,Tu a, 5 
0 = — ah = -n = my, i аы log y;, (37.7) 
where the elements of u are (4; ! yf log у}. 


LR tests of nested hypotheses 

37.7 Box and Cox (1964) present some interesting numerical examples of the 
application of this method of finding a transformation, and of a parallel Bayesian 
method of analysis which they develop. In addition, they consider the resolution of 
the maximized LF into three components corresponding to the normality, the homo- 
scedasticity, and the structure of the expectation of у. Their procedure is of general 
applicability. 

Consider sets of constraints C,, Ca . . . to be applied successively to a mathematical 
model, and let ho be the ML estimator of 4 when all of C,, C,,..., C, have been 
applied. 4, without suffix, is the ML estimator when no constraint is imposed. ‘Then, 
identically for any s, 


L e) = L(y14 Lv | hey) Цу | he) de. L(y | Aw) 
(rM) = MTT ay "Ду Аа): * orden) 
in ECA) plac > vds (37.8) 


where /, is the LR test statistic for testing the set of constraints Cj, Cy, ..., Cp- Cp 
against the set Cy, C,,..., Cp (cf. 24.1, Vol. 2). Each of the /„ lies between 0 and 1, 
and under regularity conditions, —2 log l, is asymptotically a non-central у? variable 
with d.fr. equal to the number of independent constraints upon parameters imposed 
by C, (cf. 24.7). When C, holds, this becomes a central у? variable, and thus — 2 log lp 
may be used to test the value of adding C, to the already imposed C,, Cy, ..., C4... 
It should be observed that the J, are not in general independently distributed, though 
in particular cases they may be independent under certain hypotheses (cf. Exercises 24.6 
and 24.13, and the more general result of Exercise 37.2). The application of the 
resolution (37.8) to the present problem is left to the reader as Exercise 37.1, since 
it follows immediately from some results given in Chapter 24. 


The purposes of transformation 
37.8 The virtue of the ML approach discussed in 37.3-7 is that it requires no 
prior knowledge of the relationship between y and the regressors, or of the nature of 
the error distribution of the untransformed y. It starts from the assumption that 
в 


88 THE ADVANCED THEORY OF STATISTICS 


there exists some transformation in the family considered for which all the conditions 
of the linear model, including homoscedasticity and normality of the error distribution, 
are satisfied. In particular cases, of course, this may not be so; but even then, the 
ML procedure for choice of the transformation must presumably be an improvement 
on the uncritical use of y in its original form. It is a striking fact (evidenced by the 
numerical examples given by Box and Cox (1964)) that this ML transformation is often 
very close to what is suggested by non-statistical consideration of the nature of the 
underlying variables. Such consideration should, of course, be undertaken wherever 
possible as a supplement and guide to the statistical analysis itself. 

J. B. Kruskal (1965) gives a computer-based method of finding the monotone 
transformation of the observations which minimizes the Residual SS (suitably scaled) 
from an assumed linear model. No parametric family like (37.5) is required; nor 
is the normality assumption. He uses his method to re-analyse the Box and Cox 
examples, with several others. 


37.9 Other approaches to transforming the data to meet the needs of the linear 
model have been less ambitious. They seek either to normalize the errors or to stabilize 
their variance or to remove interactions so that effects are additive; and the hope is 
general that a transformation which effects one of these aims will at least help towards 
achieving the others. It is remarkable that this indeed often turns out to be the case, 
and we shall examine some important instances shortly, but it is over-sanguine to expect 
this to be always so. It is easy to construct examples where the goals of additivity 
and homoscedasticity conflict, for if in a two-way cross-classification the expected value 
of y is additive in row- and column-effects, but the errors are non-normally distributed 
with variance a function of E( y), any transformation to remove the heteroscedacity will 
destroy exact additivity, whatever may happen to the non-normality. 

We now examine these different types of transformation in turn. 


Variance-stabilizing transformations 
37.10 Suppose that a statistic t has mean 6 and variance, for fixed sample size п, 
var t = D;(6). (37.9) 


To eliminate this dependence of variance on the parameter 0, we seek a function 
u(t) such that var и is a constant, c. In general, however, we are unlikely to be able 
to achieve this precisely, so we ask only that 

var (u(t)) = c{1+O(R-)} (37.10) 
where R is some known constant which is large enough for R-! to be negligible. In 
particular, we may have R = п, the sample size. We now assume t to be confined 


to a neighbourhood of its mean 0. The argument of 10.6-7 then applies, and we 
have from (10.14) the approximation 


var {u(t)} = (EY. var t. (37.11) 
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If (37.10) and (37.11) are equated, we have the first-order approximation 


(5) a7 О Өл 


Since we are considering only the neighbourhood of 6, we drop the suffix “і = 0," 
and write 0 for t. Thus 


e oc (D*(0)) 3, (37.13) 
where we drop the constant c dia loss, since this is any case at choice, for multi- 
plication of u(t) by a constant will not affect our purpose of achieving (37.10). We now 
integrate the equation (37.13), again ignoring the additive constant which results from 
the indefinite integration without loss, since (37.10) is unaffected. We obtain 


"ole: {1225 Th- (37.14) 


37.11 Although (37.14) was arrived at through approximation, we can check its 
validity if the theoretical distribution of ? is known by computation of the theoretical 
variance of u(t) to verify its stability as 0 varies—it may be found desirable to modify 
u(t) to improve stability. Where, on the other hand, we have only observations upon 
t and no prior knowledge of its distribution or of the parameter 0 of that distribution, 
we cannot even compute D;(0) precisely. Та such cases, the mean and variance of t 
in separate groups of observations are calculated, and the latter plotted against the 
former to give an estimate of the relationship (37.9), on which the transformation (37.14) 
is then based. Here, the approximation is more hazardous, but nevertheless often 
gives satisfactory results in practice. 


Example 37.1 
If t has the Poisson distribution, (5.20) shows that mean and variance are equal, 
so (37.9) is here simply 
Di(0)- vart = 0 
and (37.14) gives 


u(t) oc ([0-5d0),..., сй, (37.15) 
a simple square-root transformation. To the first order, by (37.11), 
var (й) = {(4¢-#)*},_, var t = 1, (37.16) 


verifying the variance stabilization to this order. 
Bartlett (1936) pointed out that variance stabilization could be improved in this 
case by re-locating t before taking the square root. If we define 


u,(t) = (to) 
Bartlett suggested using с = 1. Exercise 37.15 shows that c = 3 is a better choice. 
The table on the following page gives the variance of и, (!) as a fraction of its limiting 


variance as @—> со, for с = 0, 1 and $—the calculations were made by Bartlett (1938) 
and Anscombe (1948). 
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Variance of ис (f) as a fraction of 


в limiting variance 
с=0 | c=} | c=} 
" - naa | ы 

0 | 0 1.0 | o 

0-5 1-240 0408 | 

10 | 1608 | 0640 | 0-717 
20 | 1560 | 0856 0-924 
30 | 1360 | 0-928 0-983 
40 12204 | 0-960 0:999 
6:0 1104 | 0-980 1-002 
90 1052 | 0988 
10-0 1-001 
12-0 1036 | 0-992 
15-0 1-024 0-992 
20-0 1-000 


The inadequacy of the simplest transformation with c = 0 is evident for small 0. For 
6 less than 3, the same comparison is made graphically in Fig. 37.1, adapted from 


8 


t6 


+ m 
гг 
| brennt 
го 


Fig, 37.1 


Freeman and Tukey (1950), whose own variance-stabilization proposal, и = t +(t+1)}, 
is more stable than u(t) for <2, after which either is adequate. w’ is within 6 per 
cent. of stability for 07 1, and seems the best choice (cf. Exercise 37.17). Mosteller 
and Youtz (1961) give a table of и for x = 1(1)50. 
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Example 37.2 


In samples from a normal distribution, we know from our earlier work that if t 
is the sample variance, and о? the population variance, z = nt/(2c*) is a Gamma 
variate with parameter 3(n— 1), i.e. the distribution of z is 


dF(z) = 


1 
Ro PM e^ gan- 1-1 dz 
Ги 1) 
(cf., c.g., (11.25) in different notation). The mean and variance of z are therefore 
each equal to 4(n—1), and those of t itself are 


0 = E(t) = 22° Ма—1) = 'o%(n—1)/n, 
2 2о?\% 
Dj (6) = var t = (=) -4(n—1) = 204(n—1)/n?, 
so that here 
D$; (0) = 20*/(n—1). 

(37.14) gives 

u(t) oc ([0-1 d6), = log t, (37.17) 
and we have arrived at the simple logarithmic transformation. Since 

log t = log z--log (20/n), 
the cumulants of log t and of log ж are identical apart from the constant difference in 


the mean, and it is easy to see that these cumulants do not depend upon о? at all. The 
characteristic function of log ж is, writing p = 3(n— 1), 


He) = тг], ^+“ = ro eim. 


If p is integral (л is odd), this becomes 
Ф) = (p—1+iw)(p—2+iw) . . . (1+ш)Г(1+й) 
(p-1)(p-2) . .. 1T (I) 


Pi: 
= T(1+iw) TI (1+2), 
=1 s 
so that the cumulant-generating function is 
E : 
ба) = log Г(1+йў+ ® log ( uj (37.18) 
#=1 


Now Г(1 — ic) is the c.f. of the extreme-value distribution (14.66), with cumulants 
(cf. Exercise 14.21) 


кү = у (Euler’s constant, 0-577 . . .), 
(37.19) 


& - (r-1)3 s, r22. 
4-1 
Thus the cumulants of log z obtained from (37.18) are 


ar mel.. = Cay E Canyon) s, (3720) 
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and substitution of (37.19) into (37.20) gives 
-l 
A= e 5—1— у, 
we A (37.21) 
A -(-ly(r-1!Zs* r22. 
зер 


Thus, asymptotically, as р increases through the integers, 


© © 3/2 т\з? 
moh bro Eran ЫЛ” = 


= А/Д = 6E 5-4 / (s =» Gy = 2571, (37.22) 
Р А P*/ NP 
illustrating the rapidity of the tendency to normality of the distribution of log t. 
Bartlett and Kendall (1946) tabulate the mean and variance y, and уз up tom = 20 
(р = 9:5), at which point the asymptotic approximations in (37.22) are adequate. 


37.12 The variance-stabilization procedure of 37.10 can be repeated if necessary. 
Suppose that investigation shows the variance of u(t) to be 


var (d) = 4 Dean, (3723) 
satisfying (37.10). If we now seek a second transformed variable v(u) such that 
var {o(u)} = d(14-O(n-?), (37.24) 


we have, as at (37.11), the approximation 


var {o(u)} = («ey varu 


аҳиу\> 8 
= (ey d +20} 
by (37.23). Using (37.12), this is 


өө нерн) 
- (H) roh). Ч 


oft) = {f (2: ofi 4 ^i Ж (37.26) 


We have already encountered an instance of this procedure in Hotelling’s improved 
version of Fisher's variance-stabilizing z-transformation at (16.81) (Vol. 1)—cf. Exercises 
16.18-19, and Example 37.3 below. 

The variance-stabilization procedure could evidently be iterated further if this were 
necessary. 


"Thus, as at (37.14), 


Exercises 37.4-6 give further applications of a single variance-stabilizing transformation 
by the method of 37.10 to the binomial and negative binomial distributions. 
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Normalizing transformations 

37.13 In 6.25-6, we have already examined the Cornish-Fisher method of obtain- 
ing a normalizing polynomial transformation; and in 6.27-35, we discussed Johnson's 
systems of functional transformations to normality. 

Curtiss (1943) gives a careful mathematical discussion of the limiting normality of 
transformations, especially those discussed in our Examples and Exercises. We shall 
give some examples of the fact mentioned in 37.9, that a transformation designed to 
achieve one purpose (here, variance stabilization) often also helps to achieve another 
(here, normalization). In addition, Exercise 37.16 treats the case dealt with in Example 
37.1, where the same effect occurs. However, the last of our examples will show that 
this harmony of purposes is only obtainable by not pressing for optimal achievement in 
both directions: variance-stabilizing transformations commonly normalize as a by- 
product, but they do not produce the optimum normalization. 


Example 37.3 
We discussed, in 16.33, Fisher's variance-stabilizing transformation of the correla- 
tion coefficient r. The latter was seen at (16.74) to have variance 


Dg) = (1 zem pus roe з), (37.27) 


where р is the population correlation parameter. (37.14) applied to the leading term 
of (37.27) gives a(r) = $ log ((1--7)/(1 —7)) and the variance of 2 was seen at (16.77) to be 


тага = 1. ; de p 5* O(n-5), (3728) 
depending little upon p, so that variance stabilization is good. (16.78) showed that z 
also has skewness coefficient y, of order n-*/*, as against order n~ for r; уз is of order 
n- for both. 

It seems clear that the variance stabilization symmetrizes, and hence normalizes, as 
a by-product. 

Application of (37.26) here gives 

x = z—(3z+r)/(4n) (37.29) 

with variance further stabilized at (п — 1)-1+ O(n-?). 
Example 37.4 


We return to Example 37.2, where we saw at (37.22) that the variance-stabilized 
logarithmically transformed variable had 


w= =, y-2p! (37.30) 
asymptotically. The untransformed variable is seen from (16.6), with p = »/2, to have 
э = 2p4, у = 6p, (37.31) 


so that the variance stabilization, as by-product, has halved skewness (changing its 
sign) and reduced kurtosis by a factor of 3. 
Example 37.5 


In Example 37.2, suppose now that о? is known, and that р is the parameter. We 
now have E(z) = var (z) = р, and we are back in the same situation as for the Poisson 
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in Example 37.1: (37.14) gives a square-root transformation. "The variance stabilization 
is good, as these values given by Bartlett (1936) show: 


р 0 05 10 20 30 40 90 15:0 
уаг (рі) 0 0-182 0215 0:233 0239 0-242 0:247 0:250 


This is equivalent to Fisher’s approximation to the у? distribution, treated in 
16.5-6. It has, by (16.8), 
n= р, nis (37.32) 
distinct improvements over (37.31) for the untransformed variable (and better also 
than (37.30) for its logarithm, which has no virtue in the present case). Again, the 
variance stabilization has improved the normalization here. But note that the Wilson- 
Hilferty normalization of 16.7 has, from (16.13), even better уу, of order p~*/*, though 
not such good уу, of order -!, and (cf. 16.8) gives the better normal approximation. 
However, it does not stabilize variance at all, as (16.12) shows. Thus the best avail- 
able normalizing transformation sacrifices variance stability, and the square-root 
variance stabilizer is a better compromise transformation. 


37.14 The reader will see that our discussion of normalization has been couched 
entirely in terms of skewness and kurtosis. Mathematically, it is taking a good deal 
for granted to assume that smaller values of y, and y; are equivalent to a closer approach 
to normality; but we know of no significant example where this assumption misleads 
us in choosing between normal approximations. 

Blom (1954) seeks functional transformations u(t) for which a further polynomial 
(Cornish-Fisher) transformation as in 6.25-6 has minimum skewness and is therefore 
presumably “ nearest ” to symmetry and normality. This leads to a differential equation 
for u(t) which contains all the transformations that we have encountered for the Poisson, 
Gamma, binomial and negative binomial distributions. 


Transformations to additivity 

37.15 Although in practice it may be important to search for a scale on which 
effects are additive (i.e. interactions disappear) or nearly so, relatively little work has 
been done in this area as compared with normalization and variance stabilization. 
Some general procedures which have been proposed involve minimization, within a 
class of transformations, of the value of the test statistic used for the hypothesis that 
interactions are zero. In the two-way cross-classification, for example, we could 
minimize S; (or S,/Sz) at (35.65) in the balanced case, or (35.69) in the case of a single 
observation in each cell. It will be recognized that the test statistic is here being used 
to carry out a complex estimation procedure, and nothing but intuitive justification 
has so far been given for this method. Such additivity transformations are sometimes 
suggested by the analysis of residuals, which we discuss in 37.18-20. 


37.16 Other, more specialized, transformations will not be considered in this 
chapter. We have already (cf. 31.39, Vol. 2) discussed transformation of observations, via 
their ranks, to the standard normal order statistics, or normal scores, and in 31.71 men- 
tioned it$'use in obtaining a distribution-free test for the one-way classification AV 
situation: The Probit and Logit transformations of percentages, respectively to normal 
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and logistic distribution deviates, arise mainly in biological contexts, and are discussed 
by Finney (1952). 


Removal of transformation bias 

37.17 Whatever the purpose of a transformation, it often raises problems of 
presentation when the analysis is complete. In particular, estimators of means or of 
differences which are unbiassed on the transformed scale will not be so if the inverse 
transformation is made so that results may be presented in “ natural ” terms (cf., e.g., 
Exercise 37.15). Adjustments of some kind must be made to remove the bias due to 
transformation; a general method of bias-reduction was given in 17.10. We now 
discuss an exact method of removing the bias. 

Suppose that и is normally distributed with mean и and variance o*, and that the 
functions of u, (Á, S*/r), are jointly sufficient statistics for these parameters, / being 
normally distributed with mean and variance 2? 0°, and S*/o*, independent of Å, a 
43 variate with у d.fr. In practice, we usually have 4? = 1/n and » = n—1 where n 
is sample size. Now consider the function (и), which in our terms is the inverse 
transformation. Neyman and Scott (1960) (cf. also the succeeding paper by Schmetterer 
(1960)) showed that if /(и) satisfies the second-order differential equation 

t" (u) = A+ Bt(u) 
for constants A, B, the unique MV unbiassed estimator of the mean of the untrans- 
formed variable 0 = E(t) is given by 

t(fi) + A(1—2*)S?/(2r), В = 0, 
= = im т 1b) BU- A 

(4m È E (ACSF, Buc 


"This series converges very rapidly, only a few terms usually being required for adequate 
accuracy. 
It follows that the bias of the crude estimator /(/), which is simply the inverse 


transformation of Á, is 
a-m.a[-40-2203/2, B = 0, 
E0009 = |р (д/вурехр (-Bqt-195/2—1, B #0, 


and its absolute value is always a monotone decreasing function of 2*. Since usually 
22 = 1/n, the bias will increase with sample size. 
The following are the most important special cases: 


6 


А Inverse 5 " E 
Transformati TAE B Sign of bi 
Dr Бестин „А.В | Ee} -0 ЕЗҮ СҮ 
(teo Em EJE | —(1—35t Negative 
log (t4-c) exp (A) —c | ера) (0+ с)[ехр (— (1 — 22)02/2} — 1] —sgn(0-- c) 
arc sin (2i) sin? (A) 2 1-4 | G-p pdo- sgn(0— 3) 


ar sinh (£) sinh? (f) 


(0+ lexp {— 2(1 —25*) - 1] —sgn(0--3) 


It will be seen that as 4—> 0 (n — oo), the bias for the square root transformation —> — 0°, 
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This is the result obtained directly in Exercise 37.15, where o? = } as at (37.16). The 
reader may also recall that the bias result for the logarithmic transformation with 
c = 0, 22 = 1/n, was contained in Exercise 18.7, Vol. 2. The other two transforma- 
tions in the table are those of Exercises 37.4-5. 


Analysis of residuals 
37.18 А technique which is useful in studying departures from a postulated linear 
model, and possibly also for suggesting a power transformation to reduce these depar- 
tures, is given by Anscombe (1961) and by Anscombe and Tukey (1963). 
We confine ourselves to the model with a general mean (say, 0,) which may be 
written in the form 
y = 16,+ We, (37.33) 
where 1 is a vector of units, of order (Nx 1) like e and y, W is a (N xp) matrix and 
0 a (px1) vector. The effect of introducing a general mean 0, is (cf. Exercise 19.1) 
to replace, in the LS estimator of Ө, the elements y; of y by the deviations 2; = y;— y, 
forming a vector 
z = у-1у'1/л, (37.34) 
and also to replace the elements wy of W by the deviations from the column means 
Xy = Wy— Üp 
forming a matrix X. Thus we have 


21= Х1- 0. (37.35) 
We lose no generality, therefore, by assuming from the beginning that the model is 
у = 10,+Х0+є (37.36) 
where (37.35) holds. Then the LS estimator of 0, is ў and that of Ө is, by (19.12), 
6 = (ХХ): X'z. 
We define the matrix М = X(X'X)-! X’, and denote the vector of fitted values by 
f = X6 = Mz 
and the vector of residuals from the fitted model by 
г = z—f = (I-M)z. 
Ву (37.35), 
f'1-90, (37.37) 
and since M is idempotent, we also have 
Mr = M(I- Mz = 0. (37.38) 


37.19 Now suppose that, after fitting the model, we construct a scatter diagram 
(cf. Example 26.7, Vol. 2) with the fitted values as abscissae and the corresponding 
residuals as ordinates. By (37.37), the mean ordinate is zero. Further, since M is 
symmetric, f’ r = (Mz)'r = z'Mr = 0 by (37.38), so the fitted values are uncorrelated 
with the residuals, and the regression lines in the scatter diagram are at right angles 
(cf. 26.9). 

Apart from these two general properties of the scatter diagram, we may use its 
other features to examine how well the assumptions of the fitted model are satisfied. 
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In particular the homoscedasticity assumption may be checked roughly in terms of the 
dispersion of the residuals for different sub-ranges of fitted values, and the normality 
assumption may be checked in the same way, especially so far as skewness is concerned. 
In each case, an appropriate transformation may be made by the methods discussed 
earlier in this chapter if the assumption is found to be inadequate. 

Perhaps the most interesting use of the scatter diagram, however, is to check addi- 
tivity assumptions in multi-factor experiments. Non-additivity can manifest itself by 
evident non-linear (say, quadratic) regression of the residuals upon fitted values. 


37.20 The rough visual methods described above can be translated into numerical 

terms. Anscombe (1961) proposed to use the statistics 
tpa = Tp fq 

where r, is the vector of the pth powers of the residuals (e.g. r defined above is гу) 
and f, is the vector of qth powers of the fitted values (f = 6). Т and М are obvious 
analogues of the usual measures of skewness and kurtosis. tą, measures heteroscedas- 
ticity, since it essentially gives the covariance of the squared residuals with the fitted 
values. /,, measures non-additivity on the lines indicated at the end of 37.19. In 
fact, it is very closely related to the statistic S; used at (35.69) for testing additivity 
in a two-way cross-classification with one observation per cell—the numerator of S; is 
just tie 


37.21 Approximate sampling theory for the £,, (suitably standardized) is developed 
by Anscombe (1961) and leads to approximations to the power transformations required 
to remove the corresponding departure from the model’s assumptions. In accordance 
with our discussion of 37.9, there is no guarantee that all these statistics will point to 
the same power transformation, but a general hope that they will not differ by much. 
In this connexion, it is interesting that Box and Cox (1964) (see also the discussion of 
their paper by Anscombe) expressed the ML solution (37.7) for the power transforma- 
tion approximately in terms of the tp, with p+q = 3 and 4. In essence, the ML 
estimator carries out a kind of averaging process between the various power transforma- 
tions suggested by the individual measurements of heteroscedasticity, non-normality 
and non-additivity. It is not the least of its virtues that the ML approach automatically 
effects what might otherwise be a difficult compromise to make. 


The robustness of AV procedures 

37.22 We first consider estimation of the parameters in AV problems. Where 
Model I AV is concerned, LS estimation theory and its optimum properties (set out in 
19.4-9, Vol. 2) does not at all involve the assumption of normality for the errors. Thus 
all estimates remain valid, and so do their estimated variances, in face of non-normality: 
LS estimation is distribution-free to this extent. The normality assumption was re- 
quired in 24.27-37 for hypothesis-testing and interval estimation purposes only. 

Further, even if the basic LS model (19.8) is incorrect in respect of its assumption 
of uncorrelated, homoscedastic errors, this will not bias the LS estimator (19.12), for 
(19.13-14) hold so long as the errors have zero means. But the MV properties are lost 
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in this case, since they now pass to the true LS estimator (19.59). "Thus heteroscedas- 
ticity and correlation of the errors merely reduce efficiency without importing bias. 
For model II AV, and the other models which are considered in Chapter 36, it is easy 
to see that the expectations of Mean Squares are unaffected by the failure of the normality 
assumption, so that estimators of variance components remain unbiassed in the presence 
of non-normality of the various random variables. However, the variances of these 
estimators (e.g. in the simplest case given in Exercise 36.10) are radically changed by 
non-normality, because they are no longer simplified by the special relations between 
the cumulants of the normal distribution. 


37.23 So far as tests (and the corresponding interval estimators) are concerned, we 
noticed in 31.2-9, Vol. 2, the outstanding general feature of the effects of non-normality 
upon normal-theory procedures: tests on means are robust, while tests on variances are 
not. This generalization leads us to expect that tests and interval estimates in Model I, 
which is essentially concerned with means, will be relatively robust to non-normality; 
and that those in Model II and other AV models, which are concerned with variances, 
will not be robust. We treat robustness to non-normality in detail in 37.24-35, but 
here remark that these statements have been substantially justified in some detailed 
investigations, very fully summarized in the final chapter of Scheffé (1959); an earlier 
review by Cochran (1947) may also be consulted. 

These investigations (e.g. Horsnell (1953); Box (1954) ) also showed that in Model I 
the effects of heteroscedasticity of errors can be large in general, but are not serious 
when equal frequencies are used in all cells of the classification. (We have previously en- 
countered this effect in simpler form in 21.24 (cf. also 31.4).) ‘The practical implication 
is that on grounds of robustness alone equal cell-frequencies should be used wherever 
possible when the observations are designed. As a happy side-effect, computations are 
made much easier by this conclusion. Further, this robustness to heteroscedasticity 
in the balanced case permits us to make a simple approximate AV of cell means when 
all frequencies are unequal (cf. Exercises 37.7-8). 

The effects of stochastic dependence among the errors can be extreme (Box, 1954). 
This recalls a general point made in 36.39-43, that randomization methods of allocating 
experimental units (which may obviously introduce some dependencies among the 
errors) should be taken into account in the analysis. We shall return to these methods 
in Chapter 38. 


Robustness to non-normality in the linear model 
37.24 Ап interesting approach due to Box and Watson (1962), following earlier 
work by Box and Andersen (1955), throws some general light upon the reasons for 
varying degrees of robustness to non-normality. 
We return to the linear model with general mean, defined in 37.18. The SS 
attributable to the fitted model there is 
S, = f'f = z'Mz, 
and the Residual SS is 
Sg = гт = 2 (1— Mz. 
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When the errors are normally distributed, the LR test of Ө = 0 is based on 

Е = (S,/p){Se/(N-p-1)}, 
distributed in the variance-ratio form with d.fr. >, = p,” = N-—p—1. The test can 
equivalently be carried out on 


es SotSr zz 


S, _ zMz 


? (37.39) 


for since 
= c 
w= (+) j (37.40) 

it is a monotone increasing function of F. In the normal case, when 0 = 0, 1/F has 
the variance-ratio distribution with (V—p—1,p) d.fr. and W is the Beta variable, with 
parameters (1(N—5— 1), 4p}, obtained from 1/F by the transformation in 16.19. 

We now study the distribution of W in the general (non-normal) case. Its de- 
nominator is invariant under permutation of the elements of z. We therefore first 
consider this permutation distribution (cf. 31.16) of W. If the joint distribution of the 
elements of z is symmetric in its N arguments, as will be so in particular when the 
errors are independently and identically distributed, each permutation of them has 
probability (N!)-?. 

Once we have obtained the mean and variance of W in this permutation distribution, 
say Ep(W) and Vp(W), we shall be able to obtain the unconditional mean E(W) and 
variance V(W) from them if we know the parent distribution from which z was sampled, 


37.25 Since z'Mz is a scalar, 
z'Mz = tr (z'Mz) = tr (Mzz’) 
where we commute under the trace operator. Thus (37.39) gives 
z' zEp(W) = Ep {tr (Mzz’)} = tr {MEp(zz’)}. 
Now 


Ep(z2’) = z' z(NI- 11), (37.41) 


1 
N(N-1) 
since 

Ep(sj)- z'z/N, Е(2;2) = —-zz/(N(N-1) 
for ј #1. Substitution of (37.41) gives 
Ep(W) = NWA у= MAT) 
- We т" (М 
= P/N - 1), (37.42) 


since M11’ = 0 by (37.35) and tr (M) = р from 19.9. 
(37.42) shows that Ep(W) does not depend upon X or upon z. Thus, whatever 
the distribution of the errors, say f, the unconditional mean of W will also be 


E(W) = E{Ep(W)} = pN-1)- (37.43) 
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In particular, this will hold in the normal case, so that the mean of W is completely 
robust to departures from normality. 


37.26 To obtain Vp(W), we first find Ep(W?). Writing M,, for the elements of M, 
z'Mz = У 22M,,+ = 2,2,M,», (37.44) 
where we now always sum over all possible unequal values of the subscripts. Squaring 
(37.44) and taking expectations over all permutations of the elements of z, we find 
Ep{(z'Mz)*} = Ep(z') E M} + Ep (3; 2,4X M, M,, 
+ Ep(272;) QE Mj, X M, MS) 
Ep (2,252) (42 M,,M,,+ 2 M, Mai) 
T Ep(2,2,2,2,) E Mj, My. (37.45) 
We now write s, for the rth power-sum of the 2’s, as at (12.8) (so that s, = 0 by (37.35), 
while 2/2 = s,), and use the relations between the augmented symmetric functions 
and power-sums to evaluate the expectations in (37.45), using (12.9). From the 
weight 4 section of Appendix Table 10, we find 
NEp(2#) = Su 
менка) = cte 
NOE, (222) = 8-8, (37.46) 
NO Ep(3,252)) = a 5b 
NO Ep(z,2,2,2,) = 355—065, 
ае we may express all the sums involving the M,, in (37.45) in terms of 
m= b Mr using the idempotency of M, the value р of its trace, and the fact that 


Mi — = 0 by (37.35). These relations are: 


= M,,M,, = —m, 
УМ -p-m, 
= M,,M,, = р*%— 


X М„М„ = 2т-р, oi 


= M,,M,, = 2m—p?, 
M,,M,, = p?+2p—6m. 
Substituting (37.46-7) into (37.45), and writing 
ka = sa/(N—1), By = {N(N+1)%q—3(N—1)33} /(N-1), 
the k-statistics of the y’s by (12.28), we find, on dividing by (z’z)* = s3, subtracting 
{Ep(W)}? and simplifying, 
2p(N—p—-1) ką 1 2 2p(N-p—1 
L5) — QUY IR Rd [svo n Ет ANED ). GS) 


37.27 Unlike (37.42), (37.48) depends on X through т, and upon z through the 
ratio k,/k3. Because we found Ep(W) = E(W), we see that 


V(W) = E(W*)- (E. (W)*, 
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while Vp(W) = Ep(W*)— {Ep(W)}3, 
so that if f is the distribution of the errors, 
V(W) = ius (№)). (37.49) 


The unconditional variance of W thus depends essentially оп E(k,/R2) in (37.48). In 
the normal case, 

E(R,/&5) = E(k)/E(R5) = 0, 
since ką is distributed independently of k,/k3. (ką is a complete sufficient statistic for, 
and &,/Aj is distributed free of, the scale parameter, and Exercise 23.7 applies.) Thus 
the first term on the right of (37.48) is the normal-theory unconditional variance, and 
we rewrite the result as 


Vp(W) = {V(W)} soma [1+ BD OM ] (37.50) 
where C, = k/k. 


37.28 Exercise 37.10 asks the reader to show that m can be expressed in terms of 
the k-statistic ratios (k,/A;); and {kys/(ReoKoa)} iz of the р regressors x. Using that 
result, (37.50) may be written in the form 


Vp(W) = {V(W)}sormat [ + cs], (87.51) 
where 
Cx ау JN-3y "WNW 
< CAN (ЕД) вә 
(РР) а = 2010—12) (37.53) 


— (N¥+1)(N-1)?" 
(37.52) shows that Cy is a multivariate generalization of the univariate kurtosis ratio 
k,/k3, to which it reduces when p = 1. It has zero mean in the normal case, by the 
argument given for &,/k in 37.27. 


37.29 The permutation distribution's moments (37.42) and (37.51) permit us to 
fit a continuous distribution to the discrete permutation distribution of W in the manner 
of 31.47, by choosing a Beta distribution with the same mean and variance. Since 
both W and the fitted distribution are on the range (0, 1), and we know (cf. 37.24) that 
this distribution holds exactly for the unconditional distribution of W in normal samples, 
there is a reasonable hope of obtaining a good approximation to the general permutation 
distribution. 

The mean and variance of a Beta distribution with parameters 1v,, 3v; are (from 
Example 2.8) 


T: 25». 
Ma oiv 2) v)? 
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whence 
nonc. } к 
m = Du (i E = Ifi ui 
ui and иь are to be equated to Ep(W) and Vp(W) respectively. 
Since >/> is a function of p; only by (37.54), and Ep(W) is constant at p/(N— 1), 
we see that v, will require adjustment by the same factor as »,, Substituting Zp(W) 
for u; and Vp(W) for и» in (37.54), we find 


= (N+ Ie p 

zc (4 (37.55) 

where c is the “ correction factor” in (37.51), i.e. 

N-3 
= Socr (37.56) 
It follows that 

ва (QN De 17 a 
э = (N—p-1) (Ri (37.57) 


also. If either C, or Cx = 0, c = 0 and normal theory holds for Vp(W), to our 
approximation, whatever the underlying distribution of the errors. 


37.30 (37.49) and (37.51) show that the unconditional variance V(W) is simply 
Vp(W) with C, Сх replaced by E(C, Cx). With this modification, the approximation 
s 


of 37.29 holds for the unconditional distribution of W as well as for its permutation 
distribution. 


37.31 Since the approximating d.fr. defined at (37.55-7) depend essentially on 
the correction factor c, it is of interest to find bounds for its constituents. Exercise 37.11 
is to show that 

p?/N<m<p(N-1)/N, (37.58) 
and hence, from (37.52), 
—2<Cx(N—3)/(N-1)<N-1. (37.59) 
We see that, if these bounds for Cy are attained or nearly so, the correction factor с at 
(37.56) will be of order N-! near the lower bound and of order № near the upper 
bound. This will determine, at least in large samples, whether the correction factor 
is negligible or not, i.e. whether the distribution of W is robust or not. Since we have 
seen at (37.52) that the deviation of Cy from zero is a measure of multivariate normality 
for the x’s, we may say that normality of the regressors produces robustness to non- 
normality in W. 


Robustness to non-normality in one-way AV 

37.32 We now apply the general results for the linear model in 37.22-31 to the 
particular case of a one-way classification. Since there are (p+1) parameters (4,6) 
in (37.36), we suppose that there are (p+1) groups in the classification—the re-para- 
metrized form of the model in the later part of Example 35.1 will then be non-singular, 
since we do not have a surplus parameter producing singularity. 
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+1 
Suppose that there аге n; observations in the jth group, o> n; = №. Each of the p 
3=1 


regressors must now simply indicate whether an observation does or does not fall into 
a particular group; membership of the (p+ 1)th group is implied by non-membership 
of all the others. Ordinarily, we should define the regressors as 0-1 variables for this 
purpose, but we must satisfy (37.36) and therefore instead define 


1-7 when the observed у falls into the jth group, 


zy = (37.60) 


"у n 
— when it does пої 
N 2 


N 
fori=1,2,..., N and j = 1, 2,..., p. Then =x, = 0 as required for each j. 
i=1 


In this case we know from Example 35.1 that the SS attributable to the fitted model 
is, temporarily using bars instead of dot suffixes for averages, 


ny 2 2 
(& vu) (s zya) 
ES TENE (37.61) 


ы 


, ; A (ууа 
z Mz = y My = m=) 


and from (37.61) we see by considering the coefficient of уў that М„ = 1-1 for 
j 
every member of the jth group. Thus 


w= an (ара ы, 
N, jain; 


r= j2i^ Mt N 
and (37.52) becomes 
N-3,4, _ N(N+1) |2511 (pri) 
WESS P(N-p-1) [s м ЕУ } 2 Sete) 


which can be substituted into (37.51) to get Vp(W), first given by Welch (1937). 
Evidently, the value of Сх will depend critically on how the N observations are allocated 
to the (p+1) groups. 


37.33 If n, = N/(p+1), so that all groups have the same frequency, (37.62) 
becomes 
N-3 


Wri Cy = -2, (37.63) 
attaining the lower bound in (37.59). (37.56) is therefore 
c= -C,/N, 


and the multiplier to be applied to р and (N—p—1) in (37.55) and (37.57) is 


(ovi) / (78) 7 1*8 


negligible in large samples. 

As is indicated by the fact that zero lies between the bounds in (37.59), this negligible 
correction can actually be reduced to zero (cf. Exercise 37.12) by making group fre- 
quencies unequal in a certain way, but in general it seems unwise to produce this slight 

H 
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further increase in robustness to non-normality at the expense of losing the robustness 
to heteroscedasticity in the balanced case (cf. 37.21). 


37.34 As an extreme contrast to the equal-frequencies case treated in 37.33, 
consider the case т = n —...— n, = 1, лм = N—p. (37.62) now becomes 
Ne 3 N41 
N- Noles = Np’ (37.64) 
which is very near the upper bound in (37.59) if p/N is small. (37.56) is 
С, AN e eR 
ir a(n- : Ny) z( x): 


and the multiplier in (37.55) and (37.57) is approximately 1+ 4C,, indicating extreme 
non-robustness. 

The point of most interest about this opposite extreme emerges only when we 
calculate the SS (37.61), which is 


Y My = È (уў «(N-9G,4-9*- (37.65) 
Now it is clear that as N increases, 7, ,, and y = {Zour E i N will differ 
negligibly. Thus, to the first order in N, (37.65) is simply 
094-99 = Оу) 420.554 where у, = Хуур, 
The F- tr in 37.22 will be based in the ratio of ee to 
EE u-3Y- 3 Оу-у = = "Dupa- $) ~ = i uaa 


Thus 
yh Oyy) B7 3,4) Ууз, 


= Oo Jpn)’ /(N -p - 1) 


Apart from the term which шры у. and ў 4; in the numerator, and the corre- 
sponding extra degree of freedom there, (37.66) is the F-statistic for testing the equality 
of variances in two normal populations, from samples of size р and N — 5 (cf. Exercise 
23.14). In the light of the results of this section, it is easy to understand the extreme 
non-robustness of the latter test, referred to in general terms in 37.21 and in more 
detail in 31.6-8, where essentially the same correcting multiplier (1 +4C,) was justified 
directly for tests on variances generally. 


Robustness to normality in balanced classifications 


37.35 In the balanced one-way classification of 37.33, (37.63) and (37.52) show 
that 


(37.66) 


N 
m= E M2 = рМ. (37.67) 
r=1 


The lower bound of (37.59) will be attained, and a negligible correction for normality 
will result as in 37.33, whenever (37.67) is satisfied, and in particular whenever 
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M, = p/N (37.68) 
for all ә, ie. whenever the diagonal elements of М = X(X'X)-!X' are all equal. 
When a linear model satisfies (37.68) it is said to be quadratically balanced. It is 
easy to see from considerations of symmetry alone that (37.68) holds for any cross- 
classification with equal frequencies in every cell (previously called “ balanced ") and 
for hierarchical classifications with equal frequencies at each stage of the hierarchy. 
Atiqullah (1962), using a different method, derived this as an asymptotic result for the 
unconditional distribution of W; his results also apply to other F-tests than the overall 
test of 0 — 0, which is the only one considered here. 

In analysis of covariance in the balanced one-way classification, Atiqullah (1964) 
showed that in accordance with the conclusion of 37.31, the extent of non-normality 


of the concomitant variable determines the robustness of the F-test to non-normality 
of the errors; robustness to other departures from assumptions is also considered. 


Distribution-free methods in AV 

37.36 "The study of robustness of the standard AV methods leads us naturally to 
enquire whether other, completely robust, methods of analysis can be found. In other 
words, are there distribution-free methods for AV problems? 

We have already seen, in 31.704, that so far as the one-way classification is con- 
cerned, the answer is in the affirmative: distribution-free tests exist for the equality of 
the location parameters of k samples from any continuous populations otherwise of 
the same form. The test may be based on the ranks themselves, using the test statistic 
(31.150), or on the normal scores E(S, п) (cf. 31.71, Vol. 2). "These tests are completely 
robust for any continuous distribution and have very high asymptotic relative efficiencies 
against normal location-shift alternatives (cf. 31.71). Another test, against ordered 
alternatives, was given in 31.72-4 (cf. 35.66 and Exercise 35.15). 


Quade (1967) gives a large-sample ranks test for the one-way classification in analysis 
of covariance. 


37.37 Further, the permutation distribution of W in the general linear model, 
discussed in 37.24-31, is distribution-free in the same way as permutation tests were 
in Chapter 31; the test of Ө = 0 holds as an approximate test for the symmetry of 2 
in its arguments whatever the underlying distribution of the errors. However, this 
test of Ө = 0 does not carry us very far into AV except for the one-way classification, 
as we saw in 37.32-5. We now have to consider how far distribution-free methods 
may be of use in more complex AV situations, where we wish to test main effects, 
interactions, etc. 


Two-way cross-classification: permutation test 

37.38 Consider the simplest two-way cross-classification, with one observation 
per cell. Suppose that there are r rows and c columns, so that there are n = rc observa- 
tions in all, and that we wish to test column- (or row-) effects. In the spirit of our 
discussions of 31.21 and 31.39, the most natural procedure would be to replace the 
n observations y; by their ranks, or by some other set of conventional numbers such 
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as the normal scores, and carry out the usual AV tests upon these. It is difficult to 
make any progress with the distribution theory of this procedure, since there is no set of 
equiprobable permutations from which to deduce results. 


37.39 However, we may develop a permutation test for column-effects from which, 
as we shall see, distribution-free tests will emerge. 


The usual AV SS for testing column-effects, say So = r È (уу—У.)%, is invariant 
ј=1 
under the addition of a constant to each observation іп any row. If we take the mean 
of each row to be zero, we make the rows SS, say Sr = c X (у‹.—У.)%, equal to zero 
i=1 


since y, = y, = 0, and also reduce Sc to 


So=r zy. (37.69) 
Now if S = X E(yy—y.)*, the standard F-statistic is 
е7 
Е Sc/(c— 1) 


~ ($-8.-S3/(e- (7 0)" 
with d.fr.», = c—1and», = (r—1)(c—1). Like So, S — Ss is invariant under arbitrary 
addition of constants to the rows, and therefore F is. We may therefore without loss 


of generality put Sp = 0 and F = (71080, Tts Beta transform (cf. (3740)) is 


S-Sc 
= eee So 
w-(u) s 
E(Xiywy 
; э) 
EI -! 1—20. = (37.70) 
pue (c-1)Z (ka): 
where О = S's Уууу 


#=1 l=1 j=1 
is 


and (kp) is the pth A-statistic of the y-values in the ith row, yi Yin + + +» Vier 


37.40 We can now find the moments of (37.70), just as we did those of (37.40), 
under a hypothesis of symmetry. Here, the hypothesis is that there are no column- 
effects in the classification. Because of the invariance of F, and hence W, to constant 
additions to the rows, we need make no assumption at all about row-effects. The 
hypothesis implies that in each of the 7 rows of the classification separately, every one 
of the c! permutations of the yy (j = 1,2,...,0) is equiprobable. In all, therefore, 
there are (c!)’ equiprobable arrangements of the у; under the hypothesis. The case 
c = 2 corresponds to Fisher's test of bivariate symmetry discussed in 31.78. 

Pitman (1938), whose paper should be consulted for details, found the first four 
moments of U, and thence of W. He found 
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Ep(W) = 1. (37.71) 


At (37.71), just as at (37.42-3), the mean of the permutation distribution is the same 
whatever the observations, and therefore coincides with the normal-theory result. The 
variance, as at (37.52), is more complicated, being (cf. Exercise 37.13) 


Tanya 2 к (37.72) 
mum RED] бор 


with more complicated expressions for the higher moments. 
‘The normal theory variance is, as in 37.29, 
a} 2v", eee) 
VW) soma = (и +2а+2) (+r)? — 7?{r(c—1) +2} lan 
and, as at (37.49), this is the expectation of (37.72) under normality. Solving (37.72-3), 
we find that 


2 

БЕК EP LI 
EMI истек сто т 377 
[к= берат ү, (37.74) 

= 


in the normal case. 

The d.fr. of the F-test can be adjusted as in 37.29—Exercise 37.14 gives an instance, 
Pitman (1938) showed that when the mean and variance are made to agree with those 
of the Beta distribution in this way, the third and fourth moments also generally show 
good agreement. 


37.41 The most interesting special cases of W for our present purposes are those 
in which the observations y;; are replaced by conventional numbers, so that the test 
is made completely distribution-free. Instead of the procedure outlined in 37.38, 
we shall now replace the у, in each row separately by a set of conventional numbers, 
e.g. their ranks or the corresponding normal scores. If we use the same set in each 
row, an immediate consequence is that the (k,); are identical for all values of i Thus 

a (8), rh$ 1 5 
Gy eh ioi 
simply. If (37.75) is compared with (37.74), it is seen that they differ negligibly if 
either r or c is not too small. The distribution-free test statistic using the same set of 
conventional numbers to replace the observations in each row will then have approxim- 
ately the same distribution as the normal theory test. In particular, this is true if the 
ranks or the normal scores are used in place of the observations. 

It should be noted that when conventional numbers are used in the test, there is 
no need to put y; = 0, for Sp will be = 0 in any case. Of course у, (а constant) 
must be restored to Sg and S in this case, as in Exercise 37.14. 


'The ARE of the ranks test is discussed in 38.65 below. 
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More complex classifications 

37.42 Thus there are distribution-free AV tests of column- (or, by transposition, 
of row-) effects irrespective of the existence of row- (column-) effects, in a two-way 
cross-classification with one observation per cell. The simplicity of this situation 
arises because, as we saw in 37.39, the total SS (neglecting the general mean) has only 
three components (Sp, Sc, S—Sr— Sc) of which one (Sz) may be rendered identically 
zero by suitable choice of origin. F is then a ratio of the only two random variables 
in the problem. When conventional numbers are used, S too becomes a constant, 
so that the Beta transform W is just a constant multiple of the only remaining random 
variable, Sc. 

As soon as we begin to consider generalizing the permutation test, this simplicity 
disappears. Even in the balanced two-way cross-classification with more than one 
observation per cell, the total SS has a further (Interactions) component; for a three- 
way cross-classification with one observation per cell, also, an extra component appears. 
In consequence, the permutation distribution of the F- or W-statistic for column- 
effects in each of these cases (and a fortiori in more complex situations) will be difficult, 
and this is presumably why these tests have not been developed. 


37.43 An alternative method of generalization would be to consider the analogue 
of one of the distribution-free statistics in the more complex situations. For example, 
in Exercise 37.14, where ranks are used, W is a multiple of T, essentially the variance 
of the column total ranks. The distribution of this variance might be obtained for 
the balanced two-way cross-classification, and possibly also for the unequal-frequencies 
case; and for the three-way (rxcx1) cross-classification under the (c!)" equiprobable 
permutations of the column ranks within each row and each layer of the classification. 
So far as we know, neither of these generalizations has been carried out. It does not 
seem possible to obtain a distribution-free test for interactions by this method. 

Mehra and Sarangi (1967) have investigated the asymptotic theory of an aligned 
test for column-effects, suggested by Hodges and Lehmann (1962). Instead of the 
observations being ranked separately in each row, estimates of the row-effects are first 
subtracted from the observations, which are then ranked in a single sequence. A 
statistic which generalizes (37.70) is used, and unequal (non-zero) frequencies are 
allowed in the cells of the r xc classification. This test is more efficient in the normal 
case, especially for small values of c. 


Median tests 

37.44 A different approach to the construction of distribution-free AV tests was 
followed by Brown and Mood (1951). 'The principle of test construction which they 
used is (a) to estimate all parameters unspecified by the hypothesis by median statistics; 
and then (b) to test whether the residuals from this median-estimated model have half 
of their signs negative and half positive. 

For example, іп the one-way classification, with zt; observations in the jth group, 


k 
E n; = n, only the general mean и is left unspecified by the hypothesis of equal group 
-1 
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means. We therefore estimate и by the median ў of the л (assumed even) observations. 
In each group, we now see how many observations lie below and above $: 


Group: | 1 D os aetate s k TOTAL 
No.of f> ms; Cre н МАСА mk 4л 
observations |< ny—m, gag! ...... ne- ть in 
TOTAL т е E nk n 


(37.76) 


The hypothesis now is that all А groups have the same median, і.е. E(m;) = 3n; for 
each j. It will be seen at once that this is the binomial homogeneity test treated in 
33.55. The statistic (33.122), which in the present notation is 
2 ў Qu-iny 

X E ied (37.77) 
is asymptotically distributed in the у form with (k— 1) d.fr. The test (which may be 
carried out exactly by the method of 33.19, Case 1) is distribution-free. As for the Sign 
test in 32.3, the use of the median reduces the problem to a binomial one. 


37.45 More complex AV situations may be treated by the same general method. 
For the two-way cross-classification, as Exercises 37.18-20 indicate, a variety of tests 
is available. With one observation per cell, or in the more general situation when 
there are no interactions, column- (or row-) effects may be tested; when interactions are 
present, column-effects may be tested against interactions, or column- and interaction- 
effects may be tested jointly. 


37.46 These median tests are attractive because of their computational simplicity 
and the fact that their theory is immediately available, at least in large samples, from 
that of (2xc) contingency tables. However, not every problem is soluble by their 
use, e.g. there is no test known for column-effects against residual in the general 
balanced two-way cross-classification. Further, even when a test is available, it is not 
always distribution-free—Brown and Mood (1951) show that a median test for inter- 
actions in the balanced two-way classification is not. Finally, the efficiency of these 
median tests is not generally as high as that of tests using ranks or normal scores when 
the errors are near-normal. In 32.6-7, the Sign test was found to have ARE of 2/z in 
the normal case, against 3/z for the ranks test; and in Exercise 31.12, a median test of 
randomness was seen to have ARE of 0-78 against the 0-98 found in 31.38 for tests 
using ranks. Andrews (1954) showed that the one-way classification test discussed in 
37.44 has the same ARE, 2/z, as the Sign test while, as we saw in 31.71, the comparable 
test statistic (31.150) based on ranks has ARE 3/z. 


Bhapkar (1963) gave some efficiency results for the two-way classification median test. 


37.47 The restricted scope and relatively low efficiency of the median tests obtained 
by using the principle given in 37.44 are rather disappointing—intuitively, it seems 
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that it ought to be possible to find general AV procedures with the high efficiencies 
which we saw in Chapter 31 to be characteristic of tests based on ranks. The nearest 
approaches to such procedures have been developed in a series of papers by Lehmann 
(1963a, b, c, 1964) and by Hodges and Lehmann (1963) (cf. also Hoyland (1965), 
Bickel (1965) and Adichie (1967a, b)). These procedures are only asymptotically distri- 
bution free, and it is interesting that they, too, are based on median estimation methods 
of a different sort from those in 37.44. 


37.48 Suppose that 
Xp = щ+ у (i= 1,2,...,¢; p =1,2,...,m) 
is the model for a set of n = È n; observations. The ғ;, are independent, but other- 
wise we assume only that they have the same distribution. We write 0;; = u;—py 
for the parameters in terms of which all quantities of interest may be expressed. We 


shall discuss median estimators бу of 0;;. 
Let уу be the median of the n,n; differences (x, ул), where р = 1,2,..., m3 


q=1,2,...,m The yy are clearly estimators of ће 6;;, but they do not possess the 
desirable transitivity property that 
ijt Det Op = 0 for all ij, k. (37.78) 
Adjusted estimators which satisfy (37.78) are 
бу = У.У (37.79) 


where y; = Hi yw Lehmann (19632) gives a numerical illustration showing that 


the бу agree well with the usual LS estimators ба 

As n — co suitably, the б, tend to multivariate normality; they also have the same 
estimation efficiency, compared with the standard AV estimators based on means, as 
the Wilcoxon test has compared to “ Student's ” t-test. If f is the common frequency 
function of the є, it follows from (31.115) that the efficiency is 


1202 [ Í 4 TOKAI ае (37.80) 

E 
say. Ву 31.60-1, k? may be infinite, but can never be less than 0-864 for any continuous 
f; in the normal case, k? = Ы = 0:95. Thus, generally, kn! (0;,—0,)) has the same 


limiting distribution as n! (б, 0), where Ê; is the standard (LS) AV estimator of 6,,. 


37.49 The asymptotic property stated in the last sentence of 37.48 implies that, 
provided that we can estimate k? at (37.80) consistently, we may set up analogues of all 
the usual AV procedures in terms of the median estimators б. Lehmann (1963c) 
gives two consistent estimators of k? and (1963b) develops large-sample confidence 
intervals for any contrast or set of contrasts in the parameters. Further, the same 
author (1964) (see also Hodges and Lehmann (1962) and Doksum (1967)) extends his 
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results to the situation where there are “ nuisance factors " in the observations (i.e. in 
the terminology of Chapter 38, “ blocks ” within the experiment) with equal numbers 
of observations in each cell of the same “ block.” Two-way classifications are discussed 
by Puri and Sen (1967b). У. L. Greenberg (1966) develops the theory for incomplete 
blocks designs, which are discussed in Chapter 38 below—see also Puri and Sen (19672). 

Bhuchongkul and Puri (1965) extend the asymptotic theory to a class of estimators 
of contrasts including those based on normal scores. P. K. Sen (1966) develops 
simultaneous confidence intervals analogous to those in 35.57-63. 


Missing observations in the general linear model 

37.50 The advantages of balanced arrangements in Model I AV, namely ortho- 
gonality, ease of computation and superior robustness, are such that most designed 
analyses will seek to take advantage of them. Nevertheless, force of circumstances will 
sometimes lead to involuntary departures from the intended equality of frequencies: 
plants or animals may die, human subjects may prove reluctant to co-operate, or records 
may be lost before analysis. If this happens, we are always free to analyse the achieved 
unequal frequencies by the appropriate non-orthogonal methods, but, as we have seen, 
these are often complicated. Moreover, accidental losses of observations are rarely 
extreme; usually only one or a few are found to be missing. It is therefore worth 
investigating whether we can retain the original AV structure and correct it for the 
missing observations, rather than abandon it altogether. "The discussion which follows 
holds for the general linear model, of which AV situations are a special case. 


37.51 Suppose, then, that т of the л intended observations are missing. Without 
loss of generality, we take these to be the last m components of the observation vector y, 


H H z 
which now become unknowns, say t4, ...,u,. Thus we may write y = ( where 


z ((n—m) x 1) contains the actually observed values of y and u (mx 1) the unknown 
observations. In effect, we are presented with a fresh set of unknowns to estimate, in 
addition to the original parameters of the model. It is natural, in these circumstances, 
to estimate the values in u by the same LS method as we изе for the original parameters. 


37.52 The sum of squared residuals 
S = (y-X0)'(y-X0) 

must therefore now be minimized not only for variation in @ (as was done in 19.4) but 
also for variation in u. If we first minimize S with respect to Ө, we shall, of course, 
obtain the original LS solution (i.e. the LS solution if there had been no missing 
observation), but the estimator and the Residual SS of that solution will now both be 
functions of м, say 6(u) and S,(u). The minimization process could now be com- 
pleted by minimizing S,(u) for variation in u. 

However, this two-stage minimization procedure, which was suggested by Yates 
(1933), is not the easiest way in general. Instead, let us minimize S first for variation 


in u. Partitioning X into E) conformably with the partition y = (9. we have 
и 
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S = (z—X,)' (z—X,0)+(u—X,6)' (u—X, 0). (37.81) 


Since only the second of the two non-negative terms on the right of (37.81) depends 
upon u, we reduce it to zero by putting 
u = X,6; (37.82) 
thus S at (37.81) is reduced to its first term, which may then be minimized with respect 
to 6. 
But the results of the two two-stage minimization methods just described must be 
the same. Thus if we obtain 6(u) by the first method, which gives 


6(u) = (Х'Х)-1Х'у = (XX) QC z-- X,u), (37.83) 
and use this in conjunction with (37.82), we have 
u = X,6(u). (37.84) 


(37.84) states that each missing observation is to be equated to its estimated expectation 
in the original LS analysis. 


37.53 (37.84) is a set of linear equations to be solved for u, and the solution @ is 
then to be used in the original LS analysis. A straightforward solution was given by 
Tocher (1952). 

First, suppose that we replace u by the null vector 0 in the original LS analysis. 
(37.83) becomes 

6(0) = (X'X)-'X;z. (37.85) 


а = (I-X,(X' X)! Xj) ! X, 6(0). (37.86) 
Observing from (37.83) that 6(u) = 6(0) + (X X)-1 X, и, we find that (37.86) reduces to 
(-X,(X'X)^Xjjà = X,6(8)-X, (XX) X, à, 
so that à = X,6(a). (37.87) 
Thus @ defined by (37.86) is the solution of (37.84). 


Now consider 


37.54 We have seen, therefore, that in order to estimate the m missing observa- 
tions u, so that we may preserve the computational form of the original LS analysis, 
we need only 


(a) perform the original analysis with u = 0 to obtain 6(0) at (37.85); 

(b) calculate & at (37.86); and 

(c) again perform the original analysis using à in y. 

It should be noted that the matrix in braces to be inverted in (37.86) is (m x m). 
Thus, if only one observation is missing, the matrix is a scalar and stage (b) above 
is very simple. 

'The second of the four papers by Wilkinson (1957—60) on missing observations gives 
detailed solutions of (37.84) for many common AV situations. See also Biggers (1959). 


37.55 It is easy to see that the estimator 6(&) obtained by using (37.86) in the 
original LS analysis is exactly the estimator which would have been obtained by using 
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the (n—m) observed values alone (generally іп a non-orthogonal analysis). For, from 
(37.83), 
X’X6(a) = X,z+X,û = X;z-- X, X, 6() 
using’ (37.87). "Thus Xiz = (X'X- X,X,)6(a) 
= X;X, é(à) (37.88) 

and (37.88) is precisely the set of equations satisfied by 0 when z alone is analysed. 

'This result, together with the disappearance of the second term on the right of 
(37.81), implies at once that the Residual SS obtained by using à in the original LS 
analysis is identical with that obtained when z alone is analysed. However, the degrees 
of freedom for the Residual SS must obviously be reduced, since we now have only 
(п— т) observations. If X and X, both have the same rank (e.g. when both have full 
rank) the Residual SS will have its d.fr. reduced by m, the number of missing observa- 
tions. More generally, the reduction will be (cf. Exercise 19.8) m minus the difference 
in rank between X and Х,. 


37.56 Although the Residual SS requires no adjustment, all the other SS in the 
AV table will be incorrect if à at (37.86) is used in the original LS analysis. This is 
most easily seen from the fact (cf. Example 35.4, 35.38 and Example 35.6) that each 
other SS in the AV table may be obtained as the difference between the Residual SS 
in two linear models, one of which is a restricted form of the other. Evidently, any 
of these Residual SS is correctly obtainable, by the argument of 37.50-5, by using à 
at (37.86) for that model, but of course û will in general differ from that in the full 
model considered so far. Thus each of these Residual SS will be too large if à for 
the full model is used, since it will not be the correct (minimizing) à. Hence the 
other SS in the AV table need correction by the difference between the subtractive 
corrections to the corresponding Residual SS (or by a single subtractive correction if 
one of the latter is the Residual SS for the full model). Correspondingly, degrees of 
freedom must be corrected by the difference between two adjustments of the form 
discussed in 37.55, but this difference will often be zero. 

In the third of his four papers, Wilkinson (1957-60) gives an explicit method of 
obtaining the subtractive corrections to the other Residual SS, and hence the other SS 
inthe AV table. Fortunately, as Yates (1933) pointed out, the latter corrections, being 
generally differences of quantities of the same sign, are often small, and the unadjusted 
SS may be used as approximations. 

37.57 Tocher (1952) gives similar methods of analysis for other types of “ spoilt ” 
experiments, namely those in which some observations are irretrievably mixed up and 
those in which some observations are unwittingly duplicated—Plackett (1950) also 
discusses the latter situation. 


Afifi and Elashoff (1966) extend the analysis to the case where elements of X, as well 
as of y, are missing—cf. Exercise 37.21. 


EXERCISES 


37.1 There аге С groups of observations, and all observations within a group are normally 
distributed with common mean and common variance of, the model (37.1) holding except for 
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the homoscedasticity condition below it. Consider the sets of constraints 
Cy: all the o? are equal (G—1 constraints); 
Ca: ғ of the А parameters in Ө are zero. 


Working in terms of the variable я; = ys, so that (37.3) reduces to 
La(z | 6, 6%) = (61(2))7^, 
show that (37.8) gives 
Ща | A») = Le | AL (ә). (а) 


EL 
where l, is the LR test statistic defined at (24.40), and J, = 41+ к › where F is the 


п-к 
variance-ratio test statistic defined generally at (24.99) and for this сазе in Example 24.8. (This 
result generalizes Exercise 24.6.) 


(Box and Cox, 1964) 


37.2 Using Exercise 23.7, show that if in (37.8) Ip is distributed free of certain parameters 
for which there is a complete sufficient (vector) statistic t, and J, is a function of t alone, lp and 
1, are stochastically independent. Apply this result to establish the independence results in 
Exercises 24.6 and 24.13. Show in Exercise 37.1 that /,(z) and /,(z) are independent when 
C, and C, both hold. 

(Cf. Hogg, 1961) 


37.3 In fitting orthogonal polynomials of degree k as in 28.16, the reduction in the total 
п 

SS associated with the term of degree у is О, = o Фу) as at (28.72), Vol. 2. Show that 
-1 


the ratios 


0+1 
== Он E 06 rh hes 
s=k-r+2 

where Ок+1 = (n—k)s? is the Residual SS, are all independently distributed when the regression 
coefficients ær are all zero: 

(a) by using the result of Exercise 37.2; and 

(b) by using the result of Exercise 23.27. 
(This result indicates (cf. Hogg (1961)) that one may independently test the regression coefficients 
if one starts from the highest order and works downwards, “ pooling ” the associated SS of 
those adjudged zero with the Residual SS, until one is adjudged non-zero, when the process 
stops. All the tests are, of course, £? (F) tests, and the overall test has size 1 —(1—2)* ~ ku 
if a test of sine « is used at each stage. T. W. Anderson (1962) shows under weak assumptions 
that this procedure maximizes the probability of correctly locating a non-zero coefficient.) 


37.4 In 37.10, show that for the binomial distribution of 5.4, (37.14) gives the variance- 
x 


+ 
stabilizing transformation « ) = агс зїп Ө) ) where x/n is the observed proportion of 
n, 
“successes.” 
(Anscombe (1948) shows that better variance stabilization is obtained 
if x/n is replaced by (x+8)/(n+ł). Freeman and Tukey (1950) 
В a yt . [[х+1\! 
suggest arc sin Nea) } +aresin{ (SF) } (cf. Example 37.1), 
tabulated by Mosteller and Youtz (1961). See also Laubscher 


(1961).) 
37.5 In 37.10, show that for the negative binomial distribution of 5.15, (37.14) gives the 
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i 
variance-stabilizing transformation u(t) = ar sinh Ө) } where x/n is the observed pro- 


portion of “ successes." 
(Anscombe (1948) shows that better variance 
stabilization occurs if x/m is replaced by 
(х+%)/(п—{). See also Laubscher (1961).) 


37.6 In Exercise 37.4, show that the alternative transformation «) = log f "i ( 3 
n 


stabilizes the variance near p = 3. Show that this transformation is strictly appropriate when 
(37.9) is D2(0) = c0? (1 —0)?. 
(Cf. Bartlett, 1947a) 


37.7 For a cross-classification with unequal cell frequencies, show that if the cell means 
are analysed as single observations, their average variance may be estimated by s*/H, where 
s? is the Residual MS of the original observations and H is the harmonic mean of the cell fre- 
quencies. Hence show how an approximate AV of the cell means may be carried out. 

(Cf. Scheffé, 1959) 


37.8 Applying the method of Exercise 37.7 to the numerical data of Exercise 35.7, show 
that the approximate AV for cell means is 


Between sexes 0-023 1 0-023 

Between breeds 0-020 7 0-003 

Interactions 0-006 7 0-001 
0-049 15 


Residual 517 0:0017 (=0-0227 x 0:0759) 


Compare the values of the F-ratios in this table with the exact values in Exercise 35.7. 
37.9 Show that if a linear model contains terms Osx", ш #0, we have approximately 
xi, = xe t+ Qu T)xi log ач. 
Hence show how j can be estimated. The process may be iterated. 
(Box and Tidwell, 1962) 


37.10 In 37.28, show that М = X(X'X)-!X' is invariant under any non-singular trans- 
formation W = XT. Hence, taking W’ W to be diagonal, show that т = У Му satisfies 
T 


a N-D” fg (he „(ыз (N=1)p(@+2) 
"= enn (А) +25, fL NO) C 


(Box and Watson, 1962) 


37.11 In 37.31, by considering the variance of the diagonal elements Mrr of M, show that 
т = XM2p*/N. Using the invariance property of Exercise 37.10, suppose that X'X = I, 


and Ces (N — р) further columns, one of which is N 41, to X to form an orthogonal matrix. 
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Hence show that m<p(N—1)/N. Show from these derivations that the lower bound to т, 
but not the upper, сап be attained. 
(Box and Watson, 1962) 


37.12 In 37.32, show that if we choose (p+ 1) group-frequencies лу in the one-way classifica- 
tion so that 
Eque N= Dies 1 pt+4p+1 
jm мъ? thet? N Nai ^ 
Cx at (37.52) is zero and there is no correction to Vp(W) for non-normality, whatever the 
underlying distribution of the errors. If f = 1, and m = rN, n, = (1—7)N, show that Cx = 0 


when ? 
N-2 
a + (ху) }~ 1239), 


and that if N = 12, the optimal integer group-frequencies are 9 and 3. 
(Box and Watson, 1962) 


37.13 Establish (37.71-2) by writing U defined below (37.70) in the form 
U-ZXRa 
zn 
where 
e 
Ru = У уйу, 
ja 


and showing that 
Ep (Ru) = 0, Ep (Ri) = (c—1) (ha) (i, 


Ep(U) = 0, Ep(U*) = (с—1) zz (Аа) (Ri. 


and hence 
(Pitman, 1938) 


37.14 In 37.41 show that when (37.75) holds, 
_ Xr-1) 
Vp(W) = PEY 
and hence by the method of 37.29 that the d.fr. of the approximate F-test should be adjusted 
to 


2 
"= (e-1)-2, 
э = (7—1). 
Show that when the ranks 1, 2,..., c in each row are used as conventional numbers, with 


е 2 
R; as the sum of the ranks in the jth column and T = X [s- + a , the statistic W reduces 
Jel 


to 
от 
~ rniec-1y 
and that as r — œ, 
12T 
Ys HOME CETT 


has a x? distribution with (c—1) d.fr. 
(M. Friedman (1937); Kendall and Babington 
Smith (1939); M. Friedman (1940) compares 
the F and у? approximations.) 
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37.15 In Example 37.1, expand 
ueli) = (0+0) (v) 
in series and show that 
24c—7 
E(uc) = (9-719347, 5,703-0(073), 
and 


а a E 


so that the choice с = à removes the term of order 0-! in the variance, reducing it to 
1 
var ug/g ~ (1 ке 


{Е(ис)}%—с ~ 0— TE 


3—8c, 32‹%—52с+17, y є), 


Hence show that 
—8e 
320 
so that if the inverse transformation is used on и; to obtain an estimator of б, its downward 
bias is nearly constant at }. 
(The c = } result is due to A. Н. L. Johnson; cf. Anscombe (1948).) 


37.16 In Exercise 37.15, show that the coefficients of skewness and kurtosis of u;(t) are 


1 25—48c 
P peu ice d Б) 
7 m m eo ), 


1 f, | 945—1536c ө 
= л +0(0-*), 


compared with 
” = 6+, 
y) = 071, 
for the original Poisson variable t. ‘Thus, whatever c is chosen, y, is approximately halved 
(with changed sign) and y, is unaffected to the first order. 
(Anscombe, 1948.) 


37.17 Using the result for var ue in Exercise 37.15, show that the transformation 


u$ = (t- 4-5) + (t(-1—9y 
has variance 


1 1652— 


var u$ = 1-36* 32088- 320: 1 Lo(- 9), 
so that if we choose 5 = } to give ш of Example 37.1, 
5 
а -7 
var uj = 1— 86 + 359 * 907 ). 


37.18 For the two-way cross-classification with one observation per cell, or with more than 
one if there are no interactions, show that a median test for column-effects is obtained by counting 
the number m; of observations in the jth column which exceed the row median, and forming 
a (2хс) table like (37.76), with (37.77) as the large-sample test statistic. 

(Brown and Mood, 1951) 


37.19 For the general balanced two-way cross-classification, show that column-effects may 
be tested against interactions by finding the median in each cell and applying the test of Exercise 
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37.18 to these medians. Show that the test remains distribution-free if cell frequencies differ 


between rows, but not within rows. 
(Brown and Mood, 1951) 


37.20 In Exercise 37.19, show that column-effects and interactions may be tested jointly 
by counting the number mij of the ту observations in the (i, j)th cell which exceed the ith row 
median and then testing that E(mij) = lm for all i, j. Show that this is equivalent to the 
hypothesis (cf. 33.60) that in each row of a (r x cx 2) three-way table, the columns and layers 


are independent, leading to a large sample у? test with r(c— 1) d.fr. 
(Brown and Mood, 1951) 


z 
37.21 In 37.51-4, generalize by letting y = Es > where z contains the observed y-values 


w 
with no corresponding x-values missing, u the missing y-values with no corresponding x-values 
missing, v the observed y-values with some corresponding x-values missing, and w the missing 

X. 
X, 


y-values with some corresponding x-values missing. | x^ | is the conformable partition of X, 
v 
X, 
and X,, say, is X when X, = X, = 0. 
(a) Put u = w = 0 and replace X by Xp. Estimate Ө, obtaining 60), say; 
(b) Estimate u by (37.86), with X, instead of X; 
(с) Re-estimate Ө, putting u = 0, w = 0. 


Show that this is the LS estimator. 
(Afifi and Elashoff, 1966) 


СНАРТЕК 38 
THE DESIGN OF EXPERIMENTS 


38.1 For the greater part of this work, we have been concerned with the problems 
arising in the analysis of observations, principally the problems of estimation and 
testing which appear in various theoretical contexts. In a very obvious way, every 
investigation of a method of analysis carries its own lesson for the future. Thus, 
e.g., when we learn that a particular method of estimation is more efficient than another, 
the immediate implication is that we should use the better method in future. However, 
this implication alone would not lead us to modify the method of making observations 
in future, but merely to modify the analysis of the observations. In this chapter and 
the two following, we shall be discussing questions of design, by which we mean 
considerations affecting the method of making, or collecting, the observations to be 
analysed. 


38.2 Design considerations are not entirely new to us. In Example 28.4 we 
discovered from the analysis of a simple linear regression problem that by choosing 
the origin of measurement and the values of the regressor in a certain way, we could 
ensure an orthogonal analysis and also minimize the sampling variances of our esti- 
mators. ‘This is a design question, because it relates to how the observations are to 
be made. In the same Example, we remarked the hazard that this optimum solution 
removes the possibility of checking the assumption that the regression model was 
linear in the value of the regressor. Unless we were very sure on this point, we should 
probably “ hedge ” slightly by departing from the optimum choice of regressor values. 

Again, in 37.21, the results of robustness studies implied that equal frequencies 
should be used in all cells of an experiment. Once more, this is a design question 
since it affects the method of making observations. 


38.3 In this chapter we shall discuss questions of design as they affect experi- 
mentation, largely using the linear models and AV techniques of Chapters 35-7. In 
Chapters 39-40, we shall turn to design problems in sample surveys. The distinction 
between these fields is fairly clear-cut, and may be expressed by saying that in surveys 
we make observations on a sample taken from a finite population of individuals, whereas 
in experiments we make observations which are in principle generated by a hypothetical 
infinite population, in exactly the way that the tosses of a coin are (cf. 1.29 and 9.4, 
and Example 38.1 below). Of course, we may sometimes experiment on the members 
of a sample resulting from a survey, or even make a sample survey of the results of an 
(extensive) experiment, but the essential distinction between the two fields should 
be clear. 

Cochran (1965) gives an interesting general discussion of inferential problems 
which arise particularly in surveys rather than in controlled experiments. 

1 119 
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Principles of experimentation 

38.4 Classical discussions of the principles of experimentation emphasized the 
importance of varying the (supposedly) causal factors in an experiment in order to 
observe the effect upon the dependent variable being studied. In two respects, however, 
these discussions are now generally seen to have been inadequate. Firstly, they tended 
to be phrased in terms of the variation of a single causal factor at a time, rather than 
of all the causal factors in combination. Thus, J. S. Mill’s (1843) fifth canon of ex- 
perimental enquiry states: 

“Whatever phenomenon varies in any manner whenever another phenomenon 
varies in some particular manner is either a cause or an effect of that phenomenon, 
or is connected with it through some fact of causation.” 

In the light of the results of Chapter 35, we see now that а “ one-at-a-time ” 
approach can have no hope of evaluating the interactions between causal factors. 
Not only does this deprive us of essential knowledge of the linkages between causal 
factors: it may actually be positively misleading. For suppose that it is the purpose 
of an experiment to find which combination of ingredient 4 and ingredient B gives 
the highest resistance to breakage in a ceramic product. If we find the dose of ingredient 
A which gives highest resistance, and the dose of B which does so, it is by no means 
true that if we combine these values we have arrived at the optimum combination 
sought, as the reader may easily convince himself numerically. Interaction between 
the factors, which can produce effects like this, can only be studied by varying them 
simultaneously. 


38.5 The second inadequacy of the classical discussions is even more radical, 
and is again illustrated by the quotation from J. S. Millin38.4. It arises from the danger 
of attributing to one or more of the experimental factors, effects upon the dependent 
variable which are in reality due to variations in some causal factors not included in 
the experiment. An unrecognized causal factor may (unknown to the experimenter) 
vary during the course of the experiment in such a way as to favour a particular com- 
bination of experimental factors; this combination will then appear to be highly effective, 
when it is really the unrecognized factor which is producing the good results. 

The classical discussions had no solution to this problem, and it is essential to 
realize how deep-seated and ever-present the problem is. We can never be quite 
sure that all the important, or even the most important, causal factors have been in- 
corporated in the structure of the experiment. Some may be quite unknown; others, 
although known, may wrongly be considered to be of minor importance and deliberately 
neglected. We always need to guard against the perversion of the inferences within 
an experiment by adventitious outside effects. 


Randomization 

38.6 The modern solution to the problem of 38.5 was first propounded Ьу R. A. 
Fisher in the nineteen-twenties—cf. especially his book (Fisher (1935)). We have 
seen throughout this work that his contributions to statistical theory were remarkable 
and far-ranging. Nevertheless, it is probably no exaggeration to say that his advocacy 
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of randomization in experiment design was the most important and the most influential 
of his many achievements in statistics. 


38.7 The principle of randomization is simply stated: Whenever experimental 
units are allocated to factor-combinations in an experiment, this should be done by a random 
process using equal probabilities. Thus every factor-combination will have the same 
chance of being applied to each eligible experimental unit. 

Evidently, even if we randomize in accordance with this principle, the particular 
allocation of experimental units which we make may still work in favour of particular 
factor-combinations. However, the difficulty of 38.5 no longer troubles us if we 
incorporate the process of randomization into the framework within which our inferences 
are made. The hypothetical population within which we now infer includes every 
possible pattern of allocation of experimental units which the randomization could 
have produced. Within this population, by the very nature of the randomization 
process, the effects of factors outside the experiment can show no favour to the factors 
inside it, and our inferences are free from bias. 

Even if the relationship of the dependent variable with some unsuspected causal 
factor is not recognized until after the experiment, the validity of the inferences will 
not be impaired, provided that that factor’s influence was “ randomized out " of the 
experiment. 


38.8 Thus, the problem of 38.5 is solved by changing the inferential base. Neces- 
sarily, this has the effect of changing the theoretical basis of our inference, and we shall 
develop this point shortly. First, we illustrate by a simple example. 


Example 38.1 

An experiment is to investigate the dependence of reaction-time in male automobile 
drivers upon the alcohol content of their blood. The drivers taking part are to consume 
measured doses of alcohol and, after a fixed time-lapse, to undergo a blood-alcohol 
test and certain standardized tests of reaction-times. The problem is how the drivers 
are to be allocated to the different alcohol-doses. 

This is intrinsically a regression problem, with reaction-time as dependent variable 
(y) and blood-alcohol content as regressor (x), but it should be observed that x is not 
strictly under control—we can only control the alcohol-dose (2), and we merely observe 
the value of x in each case. However, z and x will be in fairly close relationship, 
and it is reasonable to assume that to each fixed value of z, there will correspond a 
grouped set of values of х. If the z-values are sufficiently well spaced, the x-groups 
will not overlap, and we can treat the problem as a one-way classification, the classi- 
fication being indexed by the values of z. 

In this example, it is not difficult to see the problems which arise in the absence 
of a randomized allocation of alcohol-doses to drivers. Suppose, for example, that 
the drivers were allowed themselves to choose their doses of alcohol. Presumably, 
hard drinkers would choose larger doses than other drivers. Since normal drinking 
habits affect one's tolerance of alcohol, the results in the reaction-time tests might 
tend to mask the true differences between the effects of the various alcohol-doses. 
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It may be argued that allowing the drivers to choose their doses in fact gives a 
truer picture of what happens in driving practice. Even if this be true, it is important 
to realize that it is irrelevant to the scientific enquiry undertaken. The essential 
difference between a survey and an experiment is that the former attempts to delineate 
an existing population, while the latter is concerned to investigate relationships which 
need not be those in any precise population—this was the distinction in 38.3 above. 
We shall return to this point in 38.12; for the present example, it should be clear that 
a randomized allocation of alcohol-doses to drivers is needed to protect the inferences 
within the experiment from bias. 


38.9 The fact that an “ outside ” influence, like normal drinking habits in Example 
38.1, has been removed by randomization in no way precludes us from analysing its 
effect after the (randomized) experiment is complete. In the case of Example 38.1, 
it would presumably be a difficult matter to ascertain at all accurately the normal 
drinking habits of the participants. However, consider the effect of the times of day 
at which the tests were taken. Whether or not this has been “ randomized out” 
of the experiment, there is nothing to prevent our subsequently carrying out a regression 
analysis of reaction-time upon time of day. If we found, say, that tests taken later 
in the day tended to have higher reaction-times, this would be a matter for investiga- 
tion and might lead to a modification of the experimental procedure on future occasions. 


38.10. Our statement of the virtues of randomization must not be taken to imply 
that all randomized experiments leave nothing to be desired. Consider again the 
effect of time of day upon reaction-times, as in 38.9, and suppose that all the tests in 
the experiment were carried out at 6 p.m., the end of the day’s work of the drivers 
taking part. In effect, the factor “ time of day ” is then constant at one level, and the 
experiment is vulnerable to the possibility that this factor interacts with blood alcohol- 
content in its effect upon reaction-times. The randomization with respect to alcohol- 
doses does not help at all in this respect, and the criticism of classical procedure in 
38.5 applies here. In fact, the randomization has been incomplete, because time of 
day has been neglected as a possible causal factor. Randomization can only confer 
inferential benefits within the sphere to which it has been applied. 


38.11 It will be clear, then, that the factors influencing the dependent variable 
in any experiment are, explicitly or implicitly, divided by the experimenter into three 
classes: 

(1) those incorporated into the structure of the experiment (alcohol-dose in Example 


38.1); 
(2) those “ randomized out ” of the experiment (normal drinking habits in Example 


38.1); and 
(3) those neither incorporated nor randomized out (time of day in 38.10). 
It will be observed that classes (1) and (2) require positive action, affecting the 
actual layout of the experiment or of the randomization procedure employed. By 
contrast, the last of our three classes is a residual one. It is true that the experimenter 
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may deliberately decide that a certain factor is negligible, so that it does not even 
require randomization to remove its possible effects—in Example 38.1, the eye-colour 
of the driver is almost certainly negligible in this way. However, a factor may find 
its way into class (3) simply from being overlooked, like time of day in 38.10. 

A substantial part of the skill of the experimenter lies in his choice of factors to 
be randomized out of the experiment. If he is careful, he will randomize out all the 
factors which are suspected to be causally important but which are not actually part 
of the experimental structure. But every experimenter necessarily neglects some 
conceivably causal factors; if this were not so, the randomization procedure required 
would be impossibly complicated. "Thus the choice of factors to be randomized out 
is essentially a matter of judgement. 


38.12 We saw in 38.7-8 that the population, within which inferences from a 
randomized experiment may validly be made, is a hypothetical one depending upon 
the process of randomization itself. The experimenter, however, must apply his 
inferences to the real world. In Example 38.1, the hypothetical population for the 
inferences drawn from the experiment includes every possible allocation of alcohol- 
doses to the drivers taking part. How far could these inferences be extended to cover 
the larger populations of 


(a) all male automobile drivers; 
(b) all automobile drivers, females included ? 


If the drivers taking part in the experiment were a random sample (not necessarily 
simple random) of all male drivers, few experimenters would hesitate to generalize the 
findings of the experiment to population (a). Similarly, few experimenters would be 
rash enough to generalize to population (b) without further knowledge, perhaps from 
other experiments on female drivers. However, suppose that the drivers taking part 
were selected from the employees of one corporation. Even if they were randomly 
so selected, this would only give us comfort in generalizing to the limited population 
of all drivers employed by that corporation. If (as is commonly the case) the corpora- 
tion was not chosen by any other process than its own self-interest or its willingness 
to co-operate with some scientific body, further generalization of the results of the 
experiment is a matter of judgement. Only in so far as the experimental material is, 
or is judged to be equivalent to, a random sample from a larger population may we 
generalize the experimental results to that population. 

There is ultimately no escape from the use of judgement in this connexion, for 
there are always the problems of generalization in space (e.g. to other countries) and 
in time, 


38.13 We have dealt in a very compressed way with some of the fundamental 
questions of experimental inference. For a fuller (and largely non-technical) discussion, 
the reader is recommended to read the book by D. R. Cox (1958a), as well as that of 
Fisher (1935). 
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Nuisance factors: block experiments 

38.14 We have seen that an essential feature of modern experimentation is the 
“ randomization out” of the experiment of the effects of factors outside the experi- 
mental structure. In practice, such effects are often produced by the fact that the 
experimental units themselves cannot be physically identical. Thus, agricultural 
experiments are carried out on plots of land which (however close together) will not 
have identical fertility characteristics. What is more, as the number of plots required 
for the experiment increases, the variation in their fertility will probably also increase. 
Thus it appears advisable to use a number of small groups of similar plots, rather than 
a single large group of rather heterogeneous plots. Similarly, genetic considerations 
make it advisable to conduct many animal experiments on members of the same litter, 
but litters are generally of rather small size, so a number of litters must be used. Here 
the individual animal, like the plot in the agricultural example, is the experimental 
unit; while the litters, like the groups of similar plots, are called “ blocks ” of experi- 
mental units. In this terminology we may say that many experiments are block 
experiments, or even tautologically (if we allow that there may be only one block) 
that all experiments are block experiments. The effects of the variations between 
blocks need to be investigated only because we wish to eliminate them. In fact, blocks 
аге a “nuisance” factor in the experiment. 

We proceed to a formal investigation of block experiments, based on that of Tocher 


(1952). 


38.15 Suppose that an experiment is carried out with the experimental units in 
b blocks, each containing k (>1) units, so that there are bk = n observations in all. 
The experiment is to investigate a cross-classification or hierarchical classification 
(or a mixture of these) of certain factors of interest. Suppose that there are t distinct 
cells in the classification; for a two-way classification with r rows and c columns, e.g., 
we have t = rc. We shall call these the ¢ "treatments." The problem is how to 
allocate the ż treatments to the А units in each block. 

We shall assume that no treatment is to be allocated to more than one unit in each 
block. This is a reasonable assumption, since there seems little point in duplicating 
a treatment within a block, rather than using it in a further block, if more observations 
are required. This assumption implies that t> k. 


38.16 The experiment may be completely described by defining a treatment 
matrix of order (kxt) for each block. For the jth block, t; has its (/,i)th element 
equal to 1 or 0 according as the ith treatment is or is not allocated to the Аһ unit in 
thatblock. Evidently, there will be one non-zero element in each row (since one treat- 
ment is allocated to each unit) and no more than one in each column (because of the 
last assumption in 38.15). 

If we require only to describe the allocation of treatments to blocks, without reference 
to their allocation to units within blocks, we can condense the information from the 


(*) This assumption, which is satisfied by all well-known experiment designs, may be relaxed 
(cf. 'Тосһег (1952)). 
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b treatment matrices t; into the incidence matrix of the experiment, n, of order (tx b), 
which has its (i, j)th element л; equal to 1 or 0 according to whether or not the ith 
treatment occurs in the jth block. 


38.17 If n, is the jth column of n, and 1, is a (px 1) vector of units, we have 
61, =n, (38.1) 
for this is simply summing each column of t. Also, since all entries in t; are 0 or 1, 
we have 
tjt, = diag (nj) (38.2) 
where diag (z) means a diagonal matrix with the vector z as diagonal. 
If the ith treatment occurs r; ( > 0) times in the experiment as a whole, and г is the 
(2х 1) vector of the r; summing each row of n gives 


nl, =r (38.3) 
and summing each column gives 
n'l, = kl, (38.4) 
since there are k units in each block. Further, (38.2-3) give 
Be = 2 diag (nj) = diag G n,) = diag (n1,) = diag (r). (38.5) 


38.18 In accordance with the spirit of our earlier discussion, the allocation of 
treatments to units will be randomized independently within each block, but we shall 
not for the present consider the effect of this within-blocks randomization upon the 
inferences drawn from the experiment—we return to this in 38.41. Here, we regard 
the randomization as a general precautionary measure against bias, and we conduct 
our analysis in terms of the linear model (Model I) familiar from earlier chapters. 


Linear model for block experiments 
38.19 An obvious linear model for a block experiment is 
E(yig) = +В, (38.6) 
where the т; are treatment effects and the fj; are block effects. Неге, we are assuming 
that treatment and block effects are additive, with no interactions. 

We saw at (19.19) that the only linear functions of parameters which can be un- 
biassedly estimated by linear functions of the observations are linear combinations 
of their expectations. Thus, in (38.6), only linear combinations of the (r;-- fj) can 
be so estimated, as is obvious from the fact that any constant added to all the т; and 
subtracted from all the В, would leave (38.6) unaffected. We resolve this lack of identi- 
fiability of the ту, as we did in the singular model іп 19.13-15, Vol. 2, and Example 
19.9, by introducing a linear constraint upon the parameters. Since the block effects 
В, are nuisance parameters, it is natural to impose the constraint upon them alone, in 
the form 


b 
У В; = 0. 
ј=1 


In effect, we add to the л y;; a dummy random variable (zero), whose “ expectation ” 
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is the sum of the £}, and this enables us to disentangle the unwanted block effects 
from the treatment effects in which we are interested. 


38.20 Because of the constraint, we now require only (b—1) parameters for a 
non-singular representation of the block effects. Call these ©, o, ...,%y-1, and 
array them as a vector а. We may obtain а from the vector B of the В, by defining 
an orthogonal (bx b) matrix U, whose first row is 5731; and the remaining (b— 1) 
rows arbitrary, say ш. Then 


0 
ов = (р), (38.7) 
since the constraint on the f is 
1,8 =0. (38.8) 
Because U, is orthogonal, Us! = 00, so (38.7) gives 
quU 
B-U sl = wub. (38.9) 
Thus if we define а = up, (38.10) 
0 
(38.7) and (38.9) become 0,8 = (o) cus 
В= ша 


38.21 We now proceed to the formal LS solution for block experiments. We 
write y; for the vector of the Ё observations in the jth block, y for the vector containing 
all the у, and we partition u into its column vectors ш, Us... , Up each of order 
(b—1). 'Then (38.6) may be written 


у = ХӨ+є (38.12) 
where X and Ө are conformably partitioned, with 
ti | Iu; 
t 1ш 
x = | · 5 
bkx (t-b—1) zat hz, (38.13) 
t, i у 
vins b ханз Ы; (38.14) 
@+ь-1ух1 a 


The errors in (38.12) are as usual assumed to have mean zero, variance o*, and to 
be uncorrelated. 


38.22 From (38.13), we have 


AU 5 i ES ъч diag (r) | nu’ 
оа ве аа p (38.15) 
Xu lt | Zujiil,u; un’ А1, 
j j j j j j 
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on using (38.5), (38.1) and the relations 1,1, = k, uu’ = I,_,. We may now invert 
(38.15). Writing 


Q = {diag (г) - nu' un’ /k}-, (38.16) 
we find 
Q —Qnu'/k 
(X' X) = | ---——- -jRL—-—---—---—-— - (38.17) 
—un'Q/k | (L..,--un'Qnu'/k) /k 
as may be verified by multiplication of (38.15) and (38.17). Also 
Убу; т 
Xy=|-Ż- = , 38.18 
У Ушу; uB ( ) 
Т, 
where T — | : | and T; is the total of y for all units receiving the ith treatment, 
T, 


В, 
while В = | y! where B, = 1;y; is the total of y for all units in the jth block. 
D 


From (38.17-18), the LS estimators are 
= Ы -(X'X)ix'y 


where 
# = Q(T- nu'uB/k) (38.19) 
and 
ê = u(B- n'4)/. (38.20) 
From (38.20) and (38.11), we then obtain for the original block parameters B, 
B = шё = u'u(B-n'2)/k. (38.21) 


38.23 We now simplify the estimators (38.19) and (38.21) for computational 
purposes. First, since U, in 38.20 is orthogonal, we have 


I, = 00, = (67 15 (b 17) -u'u 


so 
u'u = L,—1,1;/b. (38.22) 
Substitution of (38.22) into (38.16) gives, remembering (38.3), 
Q = {diag (т) — nn' /k 4- rr' /(bk)) ^", (38.23) 
and (38.19) similarly becomes 
# = Q{T—nB/k+rG/(bk)} (38.24) 
where we have written 
G=Be= LT (38.25) 


for the grand total of all the observations у. (38.24) may be further simplified, for 
(38.23) gives 
9-11, = r- n(n'1)/k-- r(r' 1)/(bk). (38.26) 
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Using (38.3-4) and their consequence, 


r’1, = bk, (38.27) 
the last two terms in (38.26) cancel, and 9-11, = r. Thus 
ә — 1, (38.28) 
and (38.24) becomes 
# = S(T—nB/I)-- 1, G/(bk). (38.29) 


We shall see in 38.25 that the last term on the right of (38.29) essentially estimates the 
general mean, which we have omitted from our linear model. 
Now we substitute (38.22) into (38.21) to obtain 
Ê = (—1, 1/5) (B-n'4)/k. (38.30) 
'This can again be simplified, for, using (38.25) and (38.3), 
1,(B—n't) = G-r'& = G—r'9(T—nB/k) — z' 1,G/(bk) 
on substituting (38.29). If we now use (38.25), (38.4) and (38.27), this expression 
reduces to zero. Thus we have 
£u = С, (38.31) 
and (38.30) becomes simply 
B = (B-n'4)/k. (38.32) 
Apart from the calculation of © at (38.23), the treatment parameters estimator at 
(38.29) and the block parameters estimator at (38.32) require only the vectors of treat- 
ment and block totals, T and B, and the grand total G obtained from either of these by 
(38.25). 


AV for block experiments 
38.24 In order to construct the AV for the block experiment, we now require 
the Residual SS. "This is 
S, = (y- Xô) (y-X8) = y'y-y' X6, 
and using (38.18) and (38.11) this is 


8, = уз-(2) H = yy-T'4-B'u'à 
= уу-Т'2-В'В. (38.33) 
In general, the AV is non-orthogonal, since the off-diagonal matrices in (38.17) are 
non-null. We must therefore, as in Example 35.4, 35.38 and 35.43, find the Residual 
SS, say S,, when there are no treatment differences, only block parameters and a single 
treatment parameter being estimated. ‘The difference S,— S, will then be the SS 
attributable to treatment differences. 


38.25 We thus require to modify 4 so that it is of form #1,. In (38.24), this gives 
41, = Q{T-—nB/k+rG/(bk)}. 
We substitute Өг for 1, from (38.28). Premultiplying by 2-1, transposing and post- 
multiplying by 1,, we have ¢r’1, = (T—nB/k--rG/(bk))'1, The first two terms on 
the right cancel, because (38.25) and (38.4) give 
1;(T—nB/k) = 0. (38.34) 
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We thus find, using the remaining term on the right, 
t = G/(bk) 
or 
îl, = 1,G/(bR), (38.35) 
a result intuitively obvious, since in the absence of treatment differences, the general 
mean G/(bk) will be the estimator for all treatments. We now find from (38.32), 
using (38.35) and (38.4), that in this case 


(Be-a, = B- 1, G/D)/h. (38.36) 
Substituting (38.35-6) into (38.33), we find using (38.25) that 
S, = у'у-В'В/Ё, (38.37) 


so that (as the reader is asked to verify in Exercise 38.1) 
5,-5, = T'4-- B'Ó—B'B/k 
= (T—nB/k)’ Q(T—nB/k) (38.38) 
is the SS for treatment differences, while from (38.37) the combined SS for blocks 
and the general mean is B’B/k. 


38.26 We may now display all these results in the AV table: 
AV table for the general block experiment 


Source of variation ss D.fr. 
‘Treatment differences T’ %+B’ B-B’B/k= t-1 
(allowing for block effects) = (T—nB/k)’ 2(T—nB/k) 
Block effects (ignoring B'B/k — G*/(bk) b-1 
treatment differences) 
Residual y y-T'4-B'f bk-b—t41 
General mean G*/(bk) 1 
TOTAL yy bk 


(38.39) 


The d.fr. for the Residual are obtained as a difference. (38.39) makes it clear that 
our analysis has simply separated off, from the d.fr. remaining after the (b — 1) linearly 
independent block parameters а and the general mean are allowed for, (t—1) d.fr. 
for treatment differences. 


The design of block experiments 
38.27 The crucial computation in the preceding analysis is that of 9 at (38.23), 
requiring a matrix inversion. © depends upon the incidence matrix n and the vector r 
obtained from it by (38.3). Since c*(X' X)-! is, by (19.16), the dispersion matrix 
of 6, (38.17) shows that 
V(t) = e* Q. (38.40) 
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We now seek to design the experiment (i.e. choose n) so that the dispersion matrix 
of the treatment parameter estimators has some desired form. In determining this 
form we shall impose intuitively acceptable conditions, which will lead us to the most 
commonly used designs. Kiefer (1958, 1959) discusses various concepts of optimality 
in experiment design, and shows that designs which are symmetrical between all treat- 
ments (such as those to which we shall be led by imposing symmetrical conditions upon 
all treatment parameter estimators) are optimum in most of the senses. Karlin and 
Studden (1966) review and extend the general theory. 

In particular, these symmetrical designs minimize the generalized variance) 
(the determinant of (38.40)) and have optimum local power properties in testing the 
equality of all treatment parameters. Remarkably enough, these optimum properties 
are not retained if the design itself may be chosen by a random procedure, although 
the generalized variance is still minimized as block size k—> co. 

For some theory and discussion of such random allocation (including random balance) 
designs, cf. Dempster (1960-1), Satterthwaite (1959), Budne (1959), the discussion 
of the last two papers by Youden et al. (1959), and Anscombe (1959). 


38.28 If we choose n to make €» diagonal, the treatment parameter estimators 
will be uncorrelated (orthogonal) and in addition the required matrix inversion will 
be trivial. Q will be diagonal if and only if its inverse is, and since the first term in 
the braces in (38.23) is already diagonal, we require only that M = nn'— rr'/b should 
be diagonal. Now, using (38.4) and (38.27), 

1; MI, = I; (nn' — rr//b)1, = R1; &1,— (bk)*/b = 0. (38.41) 

Thus the sum of all the elements of M is always zero. If M is to be a diagonal 
matrix, its off-diagonal elements must all be zero, and therefore the sum of its diagonal 
elements, say M; must be zero, i.e. 


b b 
Du 3 Ma m À e %-#л) = È Ў (nnt (38.42) 
4=1 =1 \j=1 $21j-21 
on using (38.3). Thus we must have 
ny—r,/b = 0, alli, j, (38.43) 


or in matrix terms 
n= rl;/b. (38.44) 

The condition for Q to be diagonal is thus that every block contains the same set of 
treatments, and every treatment is applied to a total number of units which is a multiple 
of b. Since each treatment occurs 0 or 1 times in any block, and no treatment occurs 
0 times in the experiment as a whole, this implies that each treatment occurs once in 
each block. Thus А = t and 

r= 01, (38.45) 
so (38.44) becomes simply 

n= 1,1. (38.46) 


(ж) Within any design, we know from 19.8, Vol. 2, that the LS estimators minimize the 
generalized variance among linear estimators. The result above refers to the choice between 


designs. 
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This is the incidence matrix of a randomized blocks design. 


Exercises 38.2-3 give related results. 


Randomized blocks designs 

38.29 We have therefore arrived, by a lengthy route, at the randomized blocks 
designs which were briefly mentioned in 36.39, when we first discussed the randomized 
allocation of units to treatments. The structure of a randomized block experiment 
is extremely simple: t treatments are randomly allocated to the А =? units in each 
of b blocks. Because of the diagonality of 2 from which the designs were deduced 
in 38.28, the LS estimators of the parameters and the general AV table for block 
experiments at (38.39) simplify greatly. Using (38.45-6) in (38.23) we find 


Q =1,/b, (38.47) 
while use of (38.46) and (38.25) in (38.29) and (38.32) gives 
* - Th, (38.48) 
Ê = B/t-1,G/(bt). (38.49) 
Similarly, the SS for treatment differences (38.38) becomes 
Sı— So = T T/b—G*/(bt) (38.50) 
and the Residual SS (38.33) becomes 
S, = y' y - T T/b — B B/t4- G*/(bt). (38.51) 


'The reader is asked to verify these formulae in Exercise 38.4. 
38.30 We now display the simplified AV table: 


AV table for a randomized blocks experiment 


Source of variation 55 D.fr. 
"Treatment differences T’ T/b — G*/(bt) t-1 
Block differences B’ B/t— G*/(bt) b-1 
Residual y y-T'T/b—B'B/t-- G!/(b)|  (t-1)6—1) 
General mean G*/(bt) 1 
"Тота, yy bt 


(38.52) 


Comparison with (38.39) reveals the extent of the simplification. 

The symmetry of the table (38.52) as between treatments (the symbols T, ż) and 
blocks (the symbols B, 5) makes it clear that blocks are in fact being treated as a (nuisance) 
factor in the analysis, for (38.52) is formally identical with the AV table for the two- 
way cross-classification with one observation per cell (cf. Example 35.3 and Exercise 
35.1. As always in that analysis, there is no Interactions 55, though we can separate 
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off a single d.fr. from the Residual SS to test Interactions as in Example 35.3. Thus 
we can check the assumption in the model that treatments and blocks do not interact. 


Example 38.2 

In 38.10 we saw that neglect of the time of day at which the tests were taken, in the 
experiment of Example 38.1, renders that experiment vulnerable to criticism. If now 
we treat time of day as a nuisance factor, and arrange the experiment in randomized 
blocks (each block consisting of a single time of day), this criticism is met. 


Economizing degrees of freedom: two nuisance factors 

38.31 It will be seen from (38.39) that (b — 1) d.fr. are absorbed by the b blocks used. 
If it is necessary, owing to great variability between units, even within blocks, to keep the 
number of units per block very small, the number of blocks must be correspondingly 
increased to obtain a given number of observations. Thus a large number of degrees 
of freedom will be absorbed by the blocks, and it may be necessary to seek an alter- 
native experiment design to avoid this—a situation with b much greater than t clearly 
leaves much to be desired if bt is small. 

To economize the number of d.fr. lost in eliminating blocks, we must arrange that 
the blocks do not each require a parameter as in 38.19. A simple way of achieving this 
is to arrange the blocks themselves іп a two-way classification, say with л rows and 
m columns; there are k units in each block as before. We now assume that the nm = b 
block parameters fl; satisfy 

By = pity, 1=1,2,...,m; 7 = 1,2,...,т. (38.53) 
Thus we are, in effect, using the blocks to eliminate the effects of two nuisance factors, 
corresponding to the row- and column-classifications of the blocks. 

As before, we impose the constraints 

n m 
У p= Dy, = 0 (38.54) 
1=1 ј=1 
to ensure that the treatment effects can be identified. There will thus be only 
(n—1)+(m—1) d.fr. absorbed by the blocks. 
We may now sketch the LS analysis, which is quite analogous to that of 38.15-26. 


38.32 There are now лт treatment matrices ty as in 38.16. We let п be the 
incidence matrix of order (t x n) relating to the rows, so that лу is 1 or 0 according as 
the ith treatment occurs in the /th row (irrespective of column); similarly, m is the 
incidence matrix for columns, of order (4 хт). We then have, as at (38.3), 


nl, = ml, f. (38.55) 
To implement (38.54), (38.8) is replaced by 
1,р = Lay = 0, (38.56) 


and instead of а defined at (38.10), we have 


сє = 
i EA (38.57) 
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where с is of order (7—1), 8 of order (m—1) and u, v are analogues of u in 
38.20. 


(38.12) now holds with y a vector containing nm vectors yy, 


tu i шщ ; l,vi 


em yate 
эйт: ty | lu; - һу) (38.58) 
» esl l Жыр, 
tam | Tuus | dev 
and z 
6 = | 
@+п+т—2)х1 cae (38.59) 
$ 
We now find that (cf. (38.15)) 
diag (r) | nu’ mv’ 
XX-| un’ (mL; 0 |, (38.60) 


and if we write, analogously to (38.23), 


Q = {diag (к) — nn'/(mk) — mm’ /(nk) + 2rr' /(nmk)) 1, (38.61) 
the inverse of (38.60) is 
qx = 
9 | — Qnu'/(mk) —Qmv' /(nk) 
HON Клас a ВВВ : 
—vm' Q/(nk) | | уп’ Qnu' /(nmk?) | {н tvm шу' /(nk)} (nk) 
(38.62) 


so that (38.40) remains true. Just as at (38.18), we may write 


T 

Ху = (x) (38.63) 
vC 

where T, R and C are respectively the vectors of treatment, row and column totals. 

G is the grand total as before. 


38.33 Multiplication of (38.62) by (38.63) gives the LS estimators. Instead of 
(38.19), we now find 


& = Q{T—nu'uR/(mk)— mv’ vC/(nk)} (38.64) 
which simplifies as at (38.24) to 
& = O{T—aR/(mk)—mC/(nk) +2rG/(mnk)}. (38.65) 
As at (38.32), we find ё = (К-а) (т), 
- C meo) ic 
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The Residual SS is therefore, as at (38.33), 


So = уу-Т#-Её-С? (38.67) 
and as in 38.25, if &= 11, (38.68) 
we find that the Residual SS is 
S, = y'y—R’R/(mk)—C’ C/(nk) + G?/(nmk), (38.69) 
so that the SS for treatment differences is, from (38.67) and (38.69), 
S,— S, = P' QP— G?/(nmk), (38.70) 


where we now write P for the matrix in braces on the right of (38.65). (38.70) is the 
analogue of (38.38). ‘The AV table may now be constructed just as at (38.39), with 
(n-- m —2) d.fr. in all for the block effects. We leave this, and the verification of the 
formulae above, to the reader as Exercise 38.5. 


Latin squares 

38.34 The elimination of two nuisance factors by the method 38.31-3 is most 
commonly effected with л = m and Ё = 1. Each block then consists of a single unit, 
and the units are in a square array. In 38.31-3, we have said nothing about how the 
t treatments are to be allocated to the blocks. We now assume that t = n = m, so 
that we have an array of t? units to which t treatments аге to be randomly allocated. 
If we impose the condition that each treatment be allocated just once to each row 
and just once to each column of the array, we obtain the arrangement known as a Latin 
square. It is exemplified by (38.71), with t = 4 and the treatments labelled as А, B, 
б, D: 


D 
C 
B 


оо 
сьш 
Soo 


(38.71) 


DCBA 
Euler studied Latin squares extensively from a purely mathematical viewpoint 
in the eighteenth century. The fact that they have come to be useful in the design 
of experiments in the present century is a notable example of the possible ultimate 
practical value of apparently useless theorizing. 


38.35 If we look only at the rows of (38.71), we have a randomized blocks design 
with b = t; and similarly if we look only at the columns. We may thus use the AV 
table (38.52) with b = t and either set of block differences in the second row of the 
table. This leads us to expect that we should be able to separate off both sets of block 
differences to obtain an AV table which in outline is: 


Source of variation D.fr. 
"Treatment differences t-1 
Rows 1—1 
Columns t-1 (38.72) 
Residual (2—1)(2—2) 
General mean 1 


TOTAL й 
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In fact, this is immediately deducible from the results of 38.33, by putting n = m = t 
and А = 1. We leave this to the reader as Exercise 38.6. 

If the two nuisance factors interact, contrary to the assumption of their additivity 
made at (38.53), the analysis may be seriously in error (cf. Scheffé (1959)). 


38.36 For a randomized blocks design, there is no difficulty in choosing one at 
random—we require only random permutations of numbers from 1 to t (cf. 9.14 and 
Example 9.7) to choose with equal probabilities from among the (t!) possible allocations 
of experimental units to treatments. 

For Latin squares, however, the choice of a design at random is less straightforward, 
since it is not at all obvious how many squares of a given order exist, though it is evident 
from consideration of the cyclic permutations of the elements of the first row that 
some Latin squares of any order do exist. 

The numbers of possible Latin squares of order t is very large for high values of 
t. There are, for example, 576 squares of order 4; 161,280 squares of order 5; and 
812,851,200 of order 6. Up to order 7, they have been counted. Although many 
examples of squares of higher orders are known, the problem of enumeration for 

> 8 awaits solution. Details and examples will be found in Fisher and Yates’ Statistical 
Tables. 

By interchanging rows and columns the square can always be brought to a form in 
which the top row and left-hand column are in the order ABC, etc. It is then said 
to bea “ standard square." For instance, there are four standard squares of the fourth 
order: 


AP CaP ae (ИР an CD ARC nD 
BEDEN RT Gap MC Bee wc "BAM DC (38.73) 
СРВ A Cl Die HUE oC. „А DH бул” fen} f 
DOC WM B D ABC. DECBR И DCE A 


From each of these, 144 (— 4131) squares may be derived by permuting all columns, 
and all rows except the first. (There is no point in permuting the first row, because 
the result would be a repetition of squares already obtained with an interchange of 
the letters А... D, not an essentially different layout.) The total number of squares, 
as stated above, is therefore 4x 144 = 576. More generally, each standard square 
yields t!(t—1)! squares of order t. 

It is thus only necessary to specify the standard squares. To select a Latin square 
at random, we choose a standard form at random and then permute rows and columns 
at random, the randomizing process being most conveniently carried out by using 
tables of random permutations (cf. 9.14 and Example 9.7). For squares of order 8 
or more, where the standard types have not been enumerated, we can only choose 
one of those which has, and hence select one at random from a restricted set of all 
possible squares. 


Three or more nuisance factors: Graeco-Latin and orthogonal squares 
38.37 There is no difficulty in generalizing the Latin square to provide for 


the elimination of three or more nuisance factors. We can do this by a process of 
к 
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superposition of different Latin squares. If the Latin square 
А-А СД 


(38.74) 


Boa 
Sog 
ою 
ahaw 


is superposed upon the square (38.71), with the Roman letters of (38.74) first changed to 
the corresponding Greek letters to avoid confusion, we obtain the arrangement 
Ах BB Cy Рд 
Ву Аё Рх СВ 
Có Dy АВ Ba 
DB C« Bà Ay 
in which each combination of a Roman with a Greek letter appears just once. The 
squares (38.71) and (38.74) are said to be orthogonal (Latin) squares for this reason, 
Their superposition (38.75) is called a Graeco-Latin square. 

There is evidently no Graeco-Latin square when t = 2. More remarkably, there 
is none when t = 6 even though there are 812,851,200 Latin squares for t = 6. Euler 
conjectured that no Graeco-Latin square exists when ¢ = 4r+2, and it has taken 
nearly two centuries to disprove his conjecture and show (Bose and Shrikande, 1959, 
1960; Bose et al., 1960) that a Graeco-Latin square exists except only when t = 2 
or 6. Fisher and Yates’ Tables give examples. 

The Greek letters in (38.75) may be used to identify a third nuisance factor (rows 
and columns, as before, identifying the first two), while the Roman letters are the 
treatments as before. The design then eliminates the effects of three nuisance factors 
in exactly the same way as the Latin square eliminates the effects of two. The AV 
table is an obvious generalization of Exercise 38.6, which we leave to the reader there. 


(38.75) 


38.38 A further Latin square (using a third set of symbols, say numerals) may be 
superposed upon (38.75) so that each combination of any two sets of symbols occurs 
just once, and the three Latin squares are mutually orthogonal. This is true for the 
arrangement 

Axl ВВ2 Суз [584 
By4 Aô3 Гә2 Cfl 
Сӧ2 Dyl АВА Bes 
Df3 Сод Вӧ1 Ау2 
which is called а hyper-Graeco-Latin square. If the Greek letters and numerals аге 
used to identify the third and fourth nuisance factors, this design will eliminate the 
effects of four nuisance factors. The AV is again left to the reader in Exercise 38.6. 


(38.76) 


38.39 No further Latin square can be superposed upon (38.76) while maintaining 
orthogonality; no more than (2—1) Latin squares of order t can achieve this—the 
simple proof is left to the reader as Exercise 38.7. A set of (t— 1) orthogonal squares 
of order t, like (38.76), is called a complete set of orthogonal Latin squares. Such 
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complete sets exist if t is prime or a power of a prime (cf. Mann (1949)), and hence 
for all 1 «9 except 2 and 6, when we have seen in 38.37 that not even a pair of orthogonal 
squares exists. For 10 <t<30, Barra (1965) gives details of the known sets of orthogonal 
squares. The complete sets have been enumerated for t<7. Fisher and Yates’ 
Tables give examples for t<9. 

For details of the theory and construction of Latin squares and orthogonal sets 
of them, into which we shall not enter here, the reader should refer to the pair of 
monographs by Vajda (1967a, b), to Mann’s (1949) earlier account of the methods 
due to К. С. Bose, and to the review paper by Barra (1965), which contains a bibliography 
of the subject subsequent to the review by Norton (1939). 


38.40 The practical usefulness of Latin squares in experimental work is restricted 
by the condition that the number of treatments must be the same as the number of 
levels for each nuisance factor, and this restriction increases as we pass through the 
Graeco-Latin to the higher-order sets of orthogonal squares. In consequence, these 
latter arrangements are little used. However, Latin squares are frequently used in 
agricultural experimentation, where the rows and columns of the square array represent 
the physical rows and columns in which the experimental plots (units) are laid out. 
In this way, soil fertility gradients across the experimental area in these two directions 
will have no effect on the treatments. Of course, there may be fertility gradients in 
other directions, e.g. diagonally to the square array, which the Latin square arrangement 
does not eliminate. The experimenter will, however, choose the orientation of his 
rows and columns to eliminate known or likely fertility gradients. 

It is clear that Latin square arrangements are of possible use whenever there are 
two geographical or temporal co-ordinates to be eliminated, and similarly that the 
higher-order arrangements may be called on if there are three or more such nuisance 
factors. 


Example 38.3 

In the experiment discussed in Examples 38.1-2, it might be suspected that the 
day of the week on which the tests are carried out also influences the result. The 
hypothesis here would be that there is some kind of cumulative fatigue through the 
working week, acting similarly to the “ time of day " effect already discussed. We 
can eliminate the effect of both these nuisance factors by choosing as many times of 
day as there are days in the working week (say 5) and arranging the experiment in 
a 5x5 Latin square design. Notice that only 5 treatments (alcohol-doses) would 
be possible if we used a single square. "There is, however, nothing to prevent our using 
as many 5x5 squares as are required to test all the proposed treatments, so long as 
we make the latter a multiple of 5. 

If, in addition to time of day and day of the week, place of work were regarded 
as a third nuisance factor influencing the experiment, we could choose the participating 
drivers from five work-places and arrange the experiment as a Graeco-Latin square. 
But it is precisely the inflexibility of having to choose five work-places and five times 
of day, merely because the number of working days is fixed at five, which often makes 
these designs inconvenient. 
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Randomization and robustness in randomized blocks and Latin squares 

38.41 In 38.18 we left aside the question of the effect upon inference of the random 
allocation of treatments to units within the blocks of a block experiment. The same 
question arises with respect to the random allocation of treatments to units in a Latin 
square (cf. 38.34). We must now examine this question in some detail. 

The question is a double-sided one. In the first place, since we are (as a general 
precaution against biassed inference) carrying out certain physical acts of randomization 
when we allocate treatments to experimental units, we may ask how this will affect 
the validity of the inferences drawn from a linear model with homoscedastic uncorrelated 
errors, which become independent when (as we must for hypothesis-testing) we assume 
their normality. Put this way, it is a question concerning the robustness of our in- 
ferences. 

However, the question may be put more directly and ambitiously. The randomiza- 
tions generate equiprobable sets of observations, and we have seen in Chapter 31 
and subsequently that consideration of permutation distributions can permit us to 
replace normal distribution theory by more general (distribution-free) methods. These 
often lose little or no efficiency even when the normality assumption is valid, and may 
be much more efficient when it is not. We may ask whether randomization can here 
perform the same service of freeing us from dependence on the normality assumption. 


38.42 A detailed account of randomization theory in randomized blocks and 
Latin squares is contained in the penultimate chapter of Scheffé (1959), which gives 
references to the literature. From the point of view of estimation, the most interesting 
results are those for the expected values of MS in the AV tables, quoted from Kemp- 
thorne (1952) and from Wilk and Kempthorne (1957) (cf. also D. К. Cox (1958b)). 

For the randomized blocks design, the expected MS for treatments is less than that 
for the Residual by a term depending upon the interactions between blocks and treat- 
ments, as well as exceeding it by the usual term depending upon treatment effects. 
(It is noteworthy that the presence of interactive errors (cf. 36.41) between treatments 
and units within blocks does not affect the situation.) ‘The Residual MS is thus in- 
flated by blocks-treatments interactions. However, this difficulty disappears if (as is 
often appropriate) block effects are treated as random, rather than fixed, effects; the Resi- 
dual MS may then properly be used to assess the magnitude of the MS for treatments. 

For the Latin square, the situation is more complicated, for here interactive errors 
do have some effect upon the comparison of the MS for treatments with that for Residual. 
No simple result emerges, for essentially the reason discussed in 36,42. 


38.43 We now consider the testing of treatment effects under randomization 
theory. For the case where there are no unit errors (cf. 36.41) we have already developed 
the theory in 37.39-41, where we were concerned with a permutation test for column- 
effects in a two-way classification with one observation per cell. We have seen in 
38.30 that the randomized blocks situation is formally identical with such a classification. 
Thus the results of 37.39-41 hold for testing treatment effects in randomized blocks, the 
rows there being interpreted as blocks and the columns as treatments. We therefore 
see that we may apply the usual AV test for treatments in (38.52), with d.fr. adjusted 
by the method of 37.29 as indicated in 37.40. (A few sampling experiments by Collier 


DESIGN OF EXPERIMENTS 139 


and Baker (1966) indicate that the power of the usual F-test is also robust to non- 
normality.) Alternatively, we may use distribution-free tests based on ranks or normal 
scores as in 37.41 and Exercise 37.14, with negligible correction to the d.fr. if either 
the number of treatments or the number of blocks is not too small. 


Doksum (1967) and Hollander (1967) consider testing the treatments in a randomized 
blocks experiment against ordered alternatives of the form (31.156). A test based on 
Wilcoxon’s symmetry test (cf. 31,79) has very high efficiency, compared to “ Student’s ” 
t-test, against location-shift alternatives (always > 0:864, and > 0:563 in the normal case). 
The analogue of (31.151) performs much less well. 


1f there are unit errors, Ogawa (1961, 1963) shows that the standard F-test may still 
be justified as an approximation if the variances of unit effects within blocks are nearly 
constant and the number of blocks is large enough. 

Even in the absence of unit errors, the permutation test for treatment effects in 
Latin squares is less satisfactory than that for randomized blocks, just as we saw for 
estimation in 38.42. As before, the expected value of the usual AV test statistic is the 
same under randomization as under normal theory, but the variance is complicated, and 
in consequence the evidence for the approximation of the permutation distribution by 
normal theory is very limited (Welch, (1937); Scheffé (1959)). 


38.44 The fact that the evidence for the validity of normal theory tests in ran- 
domized Latin squares is flimsy, together with the even greater paucity of such evidence 
for most other, more complicated, experiment designs, leads one to doubt the prevailing 
serene assumption that randomization theory will always approximate normal theory, 

There is a question of principle involved here. Is randomization to be explicitly 
incorporated into the theory underlying our tests and estimation procedures ? Since 
randomization lies at the root of the modern approach to statistical inference (cf. 38.7), 
it seems difficult not to answer this question in the affirmative, and consequently 
difficult to defend the relative neglect of this admittedly complicated branch of distri- 
bution theory. 


The variances of treatment differences in block experiments 

38.45 We now return to the problem of design in the general analysis of block 
experiments given in 38.19-26 above. Instead of requiring, as we did in 38.27, that 
the treatment parameter estimators should be orthogonal (leading to the randomized 
blocks design, as we found in 38.28-30), we now formulate the design problem in 
terms of the variances of the differences between these estimators. 

Suppose that we wish the experiment to result in the variance of the difference 
between the ith and /th treatment parameter estimators being 20*d;,, say. We write 
с for the (i, I)th element of the dispersion matrix of the treatment parameter 
estimators, equal to о? by (38.40) with 9 defined by (38.23). Then 

),— 20, + = 24 (38.77) 
so that 

Wi = {(ши+шц)—4ц. (38.78) 
If we write w for the vector with elements {}w;;} and D for the matrix with elements 
{-d,}, with dj; = 0 by (38.77), we may write (38.78) in the form 
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Q = wii+1,w'+D = (w| 1) (en. (88.79) 
Inspection of (38.79) makes it clear that it is identical with 
Q = (w—4el) 1 +1,(w—4cl)’ (D--cl,1/) (38.80) 


whatever the scalar с may be. Since w (a function of Q) is itself at choice in the design 
of the block experiment, no generality is lost by continuing to use (38.79), with the 
proviso that D therein may be replaced by any matrix which differs from D by a scalar 
multiple of 1,1. Such a matrix is said to be of class D. 


38.46 Suppose now that D, a constant times the desired matrix of the variances of 
treatment differences, has all its off-diagonal elements equal, so that we desire all 
differences to be estimated with the same precision. (The leading diagonal of D, 
of course, contains zeros.) This can, by (38.77), be achieved by choosing the dispersion 
matrix 9 of the treatment parameter estimators so that its diagonal elements w; are 
all equal, and its off-diagonal elements w, are all equal. In this case, w is itself a 
scalar multiple of 1, and we can choose c in (38.80) so that w—}cl, = 0. (38.80) 
then becomes 

Q =D, (38.81) 
where D, stands for a particular matrix of class D. (38.81) and (38.23) give 
D;! = diag (r)—nn’/k+rr’ /(bk) 
or 
nn’ = k{diag (r) -D; ! + rr’ /(bk)). (38.82) 
(38.28) and (38.81) give 
f-D;!l, (38.83) 
and (38.83) used in (38.27) gives 
bk = 1011. (38.84) 
Substitution of (38.83-4) into (38.82) gives the alternative form 
Р : = a3 DP LAUD 
ваг = Каар а)-ро), 

Although we have derived (38.85) under the special assumption that D has all 
off-diagonal elements equal, it holds quite generally for any D, as Tocher (1952) 
showed directly (cf. Exercise 38.8). 


(38.85) 


38.47 Still considering the special assumption of 38.46, we see that if the off- 
diagonal elements of D are all equal to -1 (corresponding to variances of differences 
all equal to 202 /а), we have 

р- 1-1 (38.86) 


апа since 
D, = р+а,1, 


D, = 1 (r+ (ac—1)1, 1}. (38.87) 
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The inverse of (38.87) may be verified to be 


oe (ac—1) , 
Dit = afl ae ун 
which we simplify to 
D;! = a(L,- 41,1), (38.88) 
where A = (1—ac)/(1--1(ac—1)). We now find for (38.83) 
r = D; 1, = a(1- Atl, = 71, (38.89) 


where r = a(1-- 4t). We thus see that each treatment occurs in the experiment the 
same number, 7, of times. Using (38.88-9), we find for (38.85) 

nn’ = k((r—a)I,*- (a/t1,1)). (38.90) 
It will be seen that the arbitrary constant A has now disappeared; it is only relevant 
in determining r. 


The design equation 

38.48 An equation for nn’ is called a design equation. Because of the definition 
of n in 38.16, we see that the (i, /)th element of nn’ counts the number of times (say 
Ла) in the experiment as a whole that the ith and /th treatments occur together in the 
same block. In particular, when i = /, the diagonal elements of nn’ are simply the 
frequencies 7; with which the ith treatment occurs in the experiment. 


Balanced incomplete blocks designs 
38.49 We now interpret the particular design equation (38.90) in the light of 
38.48. Equating its diagonal elements, we have 
{ап} = r = k{(r—a) +.a/t} 


whence 
r = ak(t—1)/{(k—1)t}. (38.91) 
The off-diagonal elements of (38.90) are 
(nn; = ka/t = à, (38.92) 


say. Thus, in the whole experiment, every pair of treatments occurs А times together 
in the same block. 
(38.91-2) give 
r(k—1) = 0—1). (38.93) 
Since t>k, (38.93) implies 2<r, as is obvious from their definitions. Moreover, 
since each of ¢ treatments occurs ғ times in the experiment with b blocks each containing 
k units, we have 


rt = bk. (38.94) 
Using (38.92-3), (38.90) may be written 
nn’ = (7— 1)1,+21,1,. (38.95) 


If = A, (38.95) is satisfied by the randomized blocks design incidence matrix (38.46), 
for then k = t, å = b from (38.93-4); this is obvious from the definitions of r and 2. 
We henceforth exclude this case, and consider only r>A, t» k. 

A block experiment satisfying the design equation (38.95) and the conditions 
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(38.93-4) is called a balanced incomplete blocks (BIB) design. These designs were 
introduced by Yates (1936a). Their characterizing features are that each of the t 
treatments appears іп r of the b blocks of k units, while each pair of the treatments 
appear together in 4 of the blocks. We were led to BIB designs by the requirement 
in 38.46-7 that every treatment difference be estimated with the same variance 2о?/а, 
where a is given by (38.91-2). 

The conditions (38.93-4) show that any three of the five constants t, k, 1, r, b 
determine the other two; commonly, the first three of these constants are used. 


38.50 Although (38.93-5) are necessary conditions (to be satisfied by integers 
t, k, 2, т, b) for a BIB design to exist, they are not in general sufficient conditions, 
since there may be no incidence matrix n satisfying the design equation (38.95) for 
nn' (cf. Exercise 38.11). Further necessary conditions may be given. For example, 
(38.95) implies that the (t x £) matrix nn’ is non-singular, so the (t x b) matrix n must 
have rank £ and 

bzt (38.96) 
(which with (38.94) implies r 7 &), a result originally differently found by К. A. Fisher. 
In a comprehensive review of the subject, Guérin (1965) summarizes other, more 
stringent, necessary conditions. (38.93-5) are known to be sufficient, as well as 
necessary, for k = 3 or 4, and also for k = 5 with A = 1, 4 or 20, results due to Hanani 
(1961). 

For a given BIB design, 4 can always be increased by any integral multiple m by 
simply repeating the whole design m times—r and b are then similarly increased. 
We shall always take т = 1 to make 2 as small as possible for any given design. How- 
ever, there may be more than one such “ minimal” 2 for given (t, А), corresponding 
to different BIB designs—examples appear in the table in 38.54 below. Further, 
several BIB designs of essentially different structure (non-isomorphic) may exist for 
the same values of t, k and 2. This is intuitively obvious from the fact that the values 
of т and 2 are first- and second-order conditions only upon the disposition of the 
treatments into the blocks—they do not in general restrict the frequencies of triples, 
quadruples, etc., of treatments. 


38.51 If ¢ and k are fixed, the third constant of the design being at choice, a BIB 
1 


design can always be formed simply by taking every one of the б 


) selections of treat- 


ments as a block. We then have 


= IH r= (a: ne (227 (38.97) 
Such a BIB design is called unreduced. Since it requires 
bk = rt = t/(k-1)! 
observations, it is only useful in practice when k or (t—k) is small. 
When k = 2, (38.93-4) show that in general 


b= 20] 
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This can only be satisfied by an unreduced design (38.97) with 2 = 1. 

When А = 2—1, (38.97) becomes b —1, r—t—1, 2 = 2-2. This is an 
example of a symmetric BIB design (b = t, Ё = r) which happens here also to be un- 
reduced. 


38.52 Since each of ¢ treatments appears r times, and each of b blocks appears 
k times, in the experiment, it is tempting to interchange the role of treatments and 
blocks in the design, putting ¢’ = b, b' = t, r' = k and &' = r to obtain a dual design. 
However, this dual will itself be a BIB design if and only if the original BIB design is 
symmetric (cf. Exercise 38.12 for illustrations). 

If a BIB design can be resolved into r subsets of blocks, each subset containing each 
treatment exactly once, the design is called resolvable, and each subset is a single replicate 
of the set of treatments. We must then clearly have t = ck, b = cr, where c is a positive 
integer. However, the latter is not alone a sufficient condition for resolvability. 

For resolvable BIB designs, (38.96) may be replaced by the stronger inequality, 
due to Bose (1942), 

b>t+r-1 (38.98) 
which is again only a necessary condition for the existence of a resolvable BIB design, for 
(38.98) actually holds whenever t = ck, b = cr (cf. Exercise 38.13). If and only if the 
equality holds in (38.98), a resolvable design has k?/t treatments common to any two 
blocks in different replicates, and is called an affine resolvable BIB design (cf. Exercise 
38.18). 


38.53 Mann (1949) gives an account of construction methods for BIB designs 
due to R. C. Bose, whose fundamental series of papers, starting with Bose (1939), is 
listed in Guérin’s (1965) comprehensive bibliography of the subject—these and other 
methods of construction are summarized by Guérin. Vajda (1967b) treats the mathe- 
matics of the subject in detail. Muller (1965) gives a method for obtaining BIB designs 
from complete sets of orthogonal Latin squares when ¢ is an integral power of a prime 
number (cf. Exercises 38.10-11 for some simple examples of such constructions given 
earlier in Fisher and Yates’ Tables). If t is odd and k does not exceed the smallest prime 
factor of t, а BIB may always be constructed by the method given in Exercise 38.20. If 
t is prime, this method is valid for any k<t. 


38.54 Fisher and Yates’ Tables give indexes, by the values of k and of r, of all 
known BIB designs with r<10, together with combinatorial methods of obtaining 
specific designs. Cochran and Cox (1957) give detailed plans for a selection of these 
designs. C. R. Rao (1961) lists, and gives combinatorial methods for, designs with 
11<r<15 (which are also included and extended in the 6th edition, 1963, of Fisher 
and Yates’ Tables) and Sprott (1962) lists designs with 16<r<20; Takeuchi (1962) 
gives a list for < 100 and r < 20 (30 for symmetric designs), with constructive methods— 
see corrections by Sillitto (1964) and Clatworthy and Lewyckyj (1968). 

Table 38.1 gives, for #< 100 and k<r <20, the values of 4 for which BIB designs, 
which are not unreduced, are known to exist. k = 2 is omitted from the table since there 
is then always (cf. 38.51) an unreduced design with 4 = 1. When А = t—1, there 
is always (by 38.51 again) a symmetric unreduced design with 4 = t—2, which is 
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also omitted from our table. Further, we may confine the table to the range k«1t, 
since (except when k = 2—1, already discussed) a design with Ё' > 32 can always be 
formed from one with k« 1t as the complementary design, obtained by considering 
all treatments omitted from each block as a complementary block containing k’ = t—k 
units. We then have, from СУ 93-4), 


_ (t=h)(t-k-1) Si. 


i k(k—1) 


Analysis of BIB designs 
38.55 The analysis of an experiment designed in BIB may now be obtained by 
simple substitution in the results of 38.23-6, which are valid for any block experiment. 
Using (38.89) and (38.95) in E. 23), we have 


Q- = rl,- H-A, 1+5 Н 


(=k) 
ETAR pi n" (38.99) 
on using (38.93-4) to eliminate r and b. (38.99) is of exactly the same form as (38.87), 
and its inverse, as there, is 
o é 
= jfi- ht}, (38.100) 


as the reader may verify directly. 
Substituting (38.100) into 29) and using (38.34), we find 


t= ta- nB/k)+1,G/(bk), (38.101) 


showing that the estimators of the treatment parameters are no longer free of the 
influence of the block totals—this is as it must be, since different sets of blocks are 
associated with the various treatments. From the definitions of n and B, we see that 
nB/k is a (tx 1) vector whose ith element is the sum of the block averages over all 
blocks containing the ith treatment. Thus 

T, = T-nB/k (38.102) 
may be called the vector of adjusted treatment totals, and is evidently of direct interest. 
(38.101) becomes 


R= Атанас), (38.103) 
(38.32) thus becomes, using (38.103) and (38.4), 
ё = B/k- inh G/(A). (88.104) 


Moreover, the treatment differences SS in the AV table (38.39) is T; QT,, and using 
(38.100) and (38.34), this is simply 
rat, = ^ 


aT (38.105) 
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We thus have the following AV table, specialized from (38.39): 
AV table for a BIB experiment 


Source of variation | ss | D.fr. 
"Treatment differences | i "mr. t-1 
Block effects B’ B/k—G*/(bk) b-1 
Residual Yy-LT.T-B'B/h | bk-t-b+1 (38.106) 
General mean G* /(bk) | 1 
ТТотАт, | yy | bk 


The reader may verify that (38.106) reduces to the randomized blocks AV table 
(38.52) if we put k = t, 2 = b. 

Cochran and Cox (1957) and Kempthorne (1952) give detailed instructions for 
computing the AV in BIB designs, with attention to the simplifications possible in 
the resolvable, symmetrical, and other special cases. They also take into account 
the recovery of inter-block information, which we are about to discuss. See also 
С. R. Rao (1947). 


38.56 It might be supposed that the results of 38.55 must complete our discussion 
of the analysis of BIB designs, but this is not so. ‘The model (38.6) on which all our 
results are based is a linear model with fixed effects, i.e. we have been carrying out 
a Model I LS analysis, as in Chapter 35. So far as the treatment parameters are 
concerned, this is generally realistic, but we have seen in 38.14 that the blocks in an 
experiment are usually nuisance factors, of no direct interest. The particular blocks 
used for an experiment are not essential to it. It is not unrealistic, therefore, to con- 
sider the block effects as random variables in our analysis. In the terminology of 
Chapter 36, we are therefore about to consider a mixed model, with treatment effects 
fixed and block effects random. This not unnaturally leads to a different analysis, 
which is usually called recovery of inter-block information. 'Vhe analysis which follows 
is not confined to BIB designs, but holds for any block experiment. 


Mixed model for the recovery of inter-block information 

38.57 We now omit the (b— 1) linearly independent block parameters а from the 
model of 38.21. Instead, we have a random block effect, say 8.9 If this has zero 
mean, it will not enter into the expected value of y, and its variance, say оў, will be 
superposed upon the ordinary errors ¢;;, which still have zero mean and variance o*. 
In the notation of 38.16-21, our model is then 
E(y) = Хт, (38.107) 


(*) We break with our usual convention and use a Greek letter for the blocks random variable, 
since b and B are already in use here. 
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where t 
es NA (38.108) 


(bkxt) 
t, 
It will be observed that we are still assuming no interactions between the block and 
treatment effects. 
The errors in the model are no longer uncorrelated, since any two observations 
in the same block share a common value of В. If we write оў/о? = p, it is easy to see 
that the dispersion matrix of y is 


A 0 
vom |. (38.109) 
0 A 
where V has along its leading diagonal b identical matrices 
A = I ply 1l (38.110) 


We suppose initially that p is known—we discuss its estimation in 38.624. 
To estimate т, we now require the generalized LS estimator, from 19.17 (Vol. 2), 


£z(XYV-X)XY-y (38.111) 
with dispersion matrix 
V(4) = (X'V-'X)-. (38.112) 


38.58 The inverse of (38.110) is, as at (38.87), 
с ien; , 
Аат (25) 1,1, (38.113) 
and using (38.108-9) and (38.113), we have 


ToN 1 , р 5 
x'v3x- (226) 
Substitution of (38.5) and (38.1) reduces this to 


T 1 а p n 
aXe. wwe з 3 
x vx 3, {diag (x) (; E Б) х вв), (38.114) 
and since it may be verified from the definitions that 

X nn; = nn’, (38.115) 

j=1 
(38.114) finally becomes A 

7 "ies piden. 
ay 1 T LN з 
x vox {diag (©) (г >) ) (38.116) 


and the dispersion matrix of the estimators at (38.112) is the inverse of (38.1 16). Also, 


TU 1. р ‚ 
X Vcy- а®6{%-(тёъ) ыу» 
and on substituting (38.18) and (38.1), this is 


vay 1 [т (22 = hfr- (ri) 
жузу=- {т (22) 29) = {т у (38.117) 


148 THE ADVANCED THEORY OF STATISTICS 
Using (38.116-17), (38.111) becomes 


= {diag Ge (em) {т- (225) »в). (38.118) 


38.59 As 05 —> 0 (р—> 0), (38.118) becomes 
lim # = {diag (г)) ? T, 
po 


and the dispersion matrix of # is then o* {diag (r)) ). "These are precisely the results 
which we should have got if block effects had been completely ignored, and this is 
as it should be since, as o5 — 0 with E(f) = 0, the block effects disappear. 

At the other extreme, when LP (and p) — co, we must rewrite (38.118) to avoid 


the matrix inversion, since now ——— wet and {diag (r)—nn’/k} is singular, as may 


1+ Tm 
be seen by postmultiplying it by 1, A ын (38.3-4). Instead of (38.118), we use 
the more general 

{diag (r)—nn’/k}* = T—nB/k. (38.119) 
(38.119) is satisfied by the estimator * at (38.29), as may be seen by pre-multiplying 
(38.24) by 2-1 and using (38.31). Thus, when block effects are large, the estimators 
(38.119) tend to coincide with the intra-block estimators (38.29). Paradoxical though 
it may sound, the change to the mixed model only affects the estimators substantially 
when block effects are small. 

It is easy to see that the intra-block estimators remain unbiassed in the mixed 
model, by arguing that since they are so for any fixed set of block effects, they must 
be unconditionally so when the latter become random variables. If the block effects 
are large, one is then tempted to use the simpler intra-block analysis, since the change 
in the estimators will, as we have just seen, be small. However, the dispersion matrix 
of the estimators is now the inverse of (38.116), instead of (38.40). 


38.60 It is not quite obvious that in the case of randomized blocks there is no 
change at all in the estimators obtained by use of the mixed model rather than the 
fixed-effects one. Exercise 38.14 shows that the estimators (38.118) coincide with 
the intra-block estimators (38.48) for randomized blocks, but that the dispersion 
matrix of the estimators is changed in the mixed model, their variances being increased. 
(This is obvious from the fact that any treatment parameter estimator T;/b is the mean 
of b independent observations with variance o?(1+p) by (38.109-10).) This is essenti- 
ally because there is no general mean parameter in our models. If there were, the 
estimators would have the same distribution in each model (cf. Exercise 38.14). 


38.61 Yates’ (1939, 19402) original treatment of the recovery of inter-block informa- 
tion proceeded differently, by observing that inter-block estimators of treatment para- 
meters could be obtained from the block totals, that these estimators were uncorrelated 
with the intra-block estimators, and that the two estimators could therefore be simply 
weighted to give the smallest attainable variance by this method—the generalized 
version of this approach is left to the reader as Exercise 38.15. "The two different 
approaches to recovery of inter-block information are not obviously equivalent, since 
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the weighting of the two components in Exercise 38.15 is determined with reference 
to the dispersion matrix of the intra-block estimators in the original (fixed-effects) 
model, while the dispersion matrix of the inter-block estimators is for the mixed model. 
Nevertheless, Exercise 38.16 shows that the two methods always give the same estimator. 


38.62 The MV estimator ® at (38.118) is a function of the variance ratio 
р = оў/о?, and this must usually itself be estimated by some f so that 4(5) may be 
used. We first estimate oj and c? separately and then take the ratio as the estimator 
of p. 

To find suitable estimators of оў and о?, we return to the general analysis of 38.19-24, 
but we now wish to find an SS attributable to blocks rather than to treatment differences, 
as in 38.25. We therefore find the Residual SS, say S,, when there are no block 
effects. The difference $,— S, will then be the SS attributable to block effects. 


38.63 We put B = 0 in (38.32), and obtain 


B-n'4 (38.120) 
where &, means (#)g-o. Premultiplying by 1; gives, from (38.25) and (38.3), 
G=r%. (38.121) 


If we now substitute (38.120-1) into (38.24), we obtain 
4, = Q9 (T— nn' 4,/k-4- rr' &,/(bk)) 


= Q{T + [S.7 — ар (г)]&,) (38.122) 
on using (38.23). Solving (38.122) for 4, gives 
(@)p-0 = {diag (ғ)) T. (38.123) 
We then have, in (38.33) 
S, = y'y - T {diag (r)} T, (38.124) 


and we find 
S,—S, = T'4-- B' — T' {diag (r)) -" T 
= (T-nB/k) G(T—nB/k)--B'B/k— T' {diag (f) -' T, (38.125) 
using the first row of the AV table (38.39). (38.125) is the required SS attributable 
to blocks. We thus have the AV table, alternative to (38.39): 


Source of variation | sS D.fr. 
Block effects | (T-nB/À)' 2 (T —nB/A) b-1 
(allowing for +B’ B/k—T {diag (2)? T 
treatment differences) | 
Treatment differences T'(diag (r))-? T — G*/(bk) t-1 (38.126) 
(ignoring block effects) 
Residual y y-T -B ё | bk-b-t+1 


General mean G?/(bk) 1 


Тотал. yy bk 
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The Residual and the General mean rows are unchanged, as they must be, but the 
remaining SS has been differently apportioned between treatment and block effects 
because here treatment differences, rather than block effects, are taken into account 
first. It will be remembered from Example 35.4 that the order is irrelevant only in 
an orthogonal analysis. 


38.64 As usual, we may now estimate c? by the Residual SS S, divided by its 
d.fr., since 
Е(5,) = (9k—b— t4-1)o*. (38.127) 
To estimate oj, we first observe from (38.124) that the SS due to Blocks (Sp, say) 
may be written S,—S, = Sg = y'y— S,— T' {diag (r)) -' T, so that, using (38.127), 
E(S,) = E(y' y)—(bk—b—t+1)o*—E[T" {diag (r)}T]. (38.128) 
From the result of Exercise 19.3, we know that if E(z) = 0 and V(z) = o*W, E(z' Az) = 
сіт (AW). Thus, assuming without loss of generality that т = 0, we have from 
(38.109-10) 
E(y' y) = о? bk(1 +p). (38.129) 
The model (38.107-10) implies that 
Е(Т) = (dig(r)), ^ WT) = o'[diag (x)--pna'], 
remembering the properties of nn’ in 38.48. We may thus again apply the result 
of Exercise 19.3 to obtain 


E[T' {diag (2) ^ T] = o? tr [(diag (7))  (diag (2) + рва] 
of tr [Lp (diag (r)) ^ nn] 
o*{t+p tr [{diag (r)} ? nn]. 

lt is easy to verify from the definitions that 


Il 


tr [{diag (r)}-!nn'] = t, (38.130) 
T E[T' {diag (ғ) ^ T] = о®(1+р) (38.131) 
and (38.129) and (38.131) reduce (38.128) to 

E(S5) = (ф—1)о®+ (bk— t)po?. (38.132) 


Thus, from (38.132) and (38.127), 
ee 1)S,/(bk—b—t+1) 
bk—t 
(38.133) and (38.127) give the required estimators. ‘Their ratio, say f, has no optimum 
property in this context, where estimation of т is of interest. Yates (1939, 19402) 
truncated the estimator of оў, replacing negative values by zero. The resulting estimator 
is p, зау. Tocher (1952) gives another, more complicated, estimator of оў which, if the 
errors have zero skewness and kurtosis, is the MV unbiassed quadratic estimator. But 
what is really required is an estimator of p whose use allows the treatment parameter 
estimator (38.118) to remain unbiassed. 
Graybill and Weeks (1959) and Graybill and Seshadri (1960) show that for 
BIB designs, 4 at (38.118) remains unbiassed when f or p is used in it; Seshadri (1966) 
shows that the variance is smaller when p is used. J. Roy and Shah (1962) show that 


} c (88.133) 
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if the estimator of p is of a certain type in terms of the latent roots of nn’, unbiassedness 
of ĉ is preserved in any incomplete blocks design with г = r1,; and that is of the 
required form. Shah (1964) constructs other unbiassed estimators in several well- 
known designs, including the BIB with 155. 


Stein (1966) discusses recovery of inter-block information in BIB designs when the 
block effects arise through randomization only. 


Permutation distributions for BIB designs 

38.65 After the general discussion of the mixed model in 38.57-64, we now 
revert to our earlier fixed-effects model for BIB experiments, and turn our attention to 
permutation tests for treatment effects. 

Ogawa (1963) shows that if there are unit errors (cf. 36.41) the standard F-test 
for treatments may be justified as an approximation to the permutation distribution 
if b is large enough and the variances of unit effects within blocks are nearly constant. 
A fortiori, this holds if there are no unit errors and 6 is large. 

If ranks are used within blocks instead of the observations, we may generalize to BIB 
designs the permutation distribution of the test statistic for treatment effects, dis- 
cussed in 38.43 and 37.39-41 for randomized blocks. The results, due to Durbin 
(1951), are given in Exercise 38.17. Van Elteren and Noether (1959) showed that, 
compared to the usual F-test for treatment effects, Durbin’s test using ranks has ARE 
exactly k/(k+1) times the Wilcoxon ARE (31.115), reducing to 3k/(z(h--1)) in the 
normal case. It will be seen that the ARE depends on block size, but not upon f£, 
See also P. K. Sen (1967) in the randomized blocks case. 

It is interesting to note that here only the first two moments of the test statistic 
can be generally obtained, precisely because, as we mentioned at the end of 38.50, 
the BIB conditions lay down no pattern for the appearance of the treatments in sets 
of more than two. 


Benard and van Elteren (1953) give a large-sample chi-square permutation test for 
an arbitrary (not necessarily balanced) incomplete blocks design using ranks, repeated 
as well as missing observations being allowed. 


Preference experiments 

38.66 BIB designs are of interest in connexion with preference experiments 
(where measurements of degree of preference are often not possible, but rankings of 
preferences are). If preferences are to be expressed within b blocks of k objects 
(treatments) selected from 1, the order in which these objects are examined may be 
important, and it is desirable to arrange the BIB to take this order-effect into account. 
A simple way is to let the objects be examined in the orders determined by the column 
positions of a (txt) Latin square. If the first А columns of the square are used, they 
determine order in £ blocks of А objects, each of the t possible objects appearing once 
in each position in the ordering. If b = ct, where c is a positive integer, we can obtain 
complete order-balance by using the first k columns of c (t xt) Latin squares in this 
way. 

Thus, e.g., the first three columns of (38.71) give a BIB design with t = 4, b = 4, 


k-—3andA = 2. Each of the letters А, B, C, D occurs once in each column position, 
L 
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An incomplete Latin square, used in this way, is known as a Youden square design, 
after its discoverer, W. J. Youden. Of course, position-balance may also be important 
in any (not necessarily preference) BIB experiment, where “ position” may stand 
for any variable whose “ nuisance” effect we wish to exclude from the experiment, 
just as in our original discussion of Latin squares (cf. 38.31 and 38.34). 


Paired comparisons 

38.67 The particular case k = 2 in a BIB design (when, as we saw in 38.51, the 
design is unreduced) is usually described as a paired comparisons design, and is of 
particular importance in preference experiments. Н. A. David (1963) has recently 
devoted a monograph to methods for paired comparisons, which includes a chapter 
on appropriate experiment designs. Perhaps the most important of these are the 
linked paired-comparison designs developed by Bose (1956). In these designs, each 
of t judges compares r pairs of objects (chosen from a total of n objects). Each pair 
is compared by & judges, and there are exactly 2 pairs in common to any two judges. 
As the notation indicates, there is a correspondence with BIB designs: each judge 
is a “ treatment,” each pair of objects a block “ containing ” k such treatments. ‘There 
are b = 4n(n—1) blocks in all. The new feature of the linked paired-comparison 
designs is that we require each of the л objects to appear equally frequently in the 
r pairs of each judge, i.e. to appear 2r/n = « times. "Thus, by (38.94), we have 

a = k(n—1)/t. (38.134) 

Because of the additional condition (38.134), the existence of a BIB design does not 
imply the existence of a linked paired-comparison design, although the latter clearly 
implies the former. Bose (1956) gives (and David (1963) reproduces) methods for 
deriving linked paired-comparison designs from BIB designs. 


Partially balanced incomplete blocks 
38.68 The essential feature of BIB designs (cf. 38.49) is complete symmetry 
between the treatments, each of which appears r times in all and 4 times with any 
other treatment. This symmetry was а natural consequence of the symmetrical demand 
for the same precision in all treatment-difference estimators in 38.46-7. While main- 
taining the condition that each treatment appears r times, we now relax the condition 
that 4 be constant. 
Suppose that for each treatment the remaining (¢ — 1) treatments fall into m classes 
m 
of size tj, X t, —t — 1. These are called associate classes, and any treatment in 
-1 


p= 
the pth associate class is called a pth associate of the given treatment. We now require 
that 
(a) all pth associates appear together in the same block 2, times; 
(b) if A is a pth associate of B, B is a pth associate of A; 
(c) the number of treatments which are both pth associates of a treatment A, 
and qth associates of another treatment В, is the same for all /th associates 
A, B. We write this number as Р. 


A design satisfying these conditions is called a partially balanced incomplete 
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blocks (PBIB) design. ‘These designs were first considered by Bose and Nair (1939). 
Evidently, they contain BIB designs as the special case when 2, = å for all р. 


38.69 PBIB designs have many constants: t, r, b and k as before; the m values 
4, and the m values £,; and the Р. There are, however, linear relations between 
these constants which reduce their effective number. 

The case of two associate classes (m = 2) has been studied in detail by Bose et al. 
(1954) who give tables of all known designs with r<10, 3<k<10. Even in this 
simplest case, there are five types of association scheme (i.e. the scheme which sets out 
the associate relationships between the treatments), the most important type being the 
group divisible PBIB designs, which themselves consist of three sub-types. 

Guérin (1965) gives an extensive summary of the main results on the existence 
and construction of PBIB designs, with a comprehensive bibliography. See also the 
mathematical treatment by Vajda (1967b). Details of the appropriate methods of 
statistical analysis, including recovery of inter-block information, are given by Bose 
et al. (1954), by Kempthorne (1952), and by C. R. Rao (1947). 


Structured treatments: lattice designs 

38.70 ‘Throughout our treatment of experiment designs, we have made no assump- 
tions concerning relationships between treatments. Now we suppose that the treat- 
ments in the experiment are classified into certain categories. This is the case, e.g., 
when the treatments are the rc combinations in a two-way cross-classification, the 
treatments then falling naturally into a two-way table. Block experiments taking 
account of such classifications are called lattice designs, and were introduced by Yates 
(1936b, 1939, 1940b). "They are of particular value when the number of treatments, 
t, is large, for the table in 38.54 shows how few BIB arrangements are then available. 


38.71 Suppose that the ¢ treatments are arranged in a (/x А) two-way array, so 
that? = Ik. We might then use the & treatments in each of the / rows within a single 
block; and similarly the / treatments in each of the А columns within a single block. 
We thus obtain a design containing / blocks of k units and А blocks of / units. This is 
called a rectangular lattice design, the name arising from the fact that the treatments 
may be represented as the points of a lattice in the (/х А) array. То bring this within 
the scope of our discussion, which has throughout been limited to blocks of equal 
size, we must restrict ourselves to the square lattice where k = 1. With two replicates 
of the treatments as above, it is sometimes called a simple lattice; with three replications, 
a triple lattice; with four replications, a quadruple lattice, and so on. 

The square lattice is not a balanced design in the sense that a BIB is, for although 
every treatment appears r — 2 times, it is not true that every treatment appears equally 
frequently in the same block with every other treatment. The reader will see at once 
that the frequency of joint appearance À; is unity for treatments in the same row ог 
column of the (k xk) array, and zero for all other pairs of treatments. 

One can evidently generalize the two-dimensional (k xk) array of treatments to 
а p-dimensional array containing А? treatments. We then have &?-! blocks of А units 
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in a single replication of the treatments. Such p-dimensional lattice designs are again 
not balanced. The cubic lattice (р = 3) is important in some applications. 


38.72 To avoid the more complicated analysis attending upon lack of balance, 
we may construct balanced lattice designs by replicating the simple lattice arrangements 
just discussed. In the case t = k? = 9, the 3x 3 array 


1 
4 (38.135) 
7 


оо л 
Caw 


yields the simple square lattice design as described in 38.71 with six blocks 
1|4|7/1|2]|3 
21518141516 (38.136) 
316|91718]|9 

arranged in two complete replications of the treatments. If we now add two further 

replications 


2|3 
45 (38.137) 
947 


the ensemble of (38.136-7) is fully balanced, as the reader may verify. In fact, it 
forms a BIB with t = 9, k = 3, 2 = 1, b = 12, r = 4. 
The four complete replications in (38.136-7) may be used to form a set of lattice 
squares, derived as in 38.71 from (38.135) and the further array 
155559 
672 (38.138) 
8 3 4 
These designs are more valuable when А = t! is odd, only 4(k+1) squares then being 
required, as in our example. When А is even, k+1 squares are needed to form a set. 


38.73 Details of the theory and analysis of lattice designs, into which we have 
no space to enter here, are given by Kempthorne (1952) and by Cochran and Cox 
(1957). Their importance for our exposition is that they have led us to consider 
a set of treatments which are “ structured,” at least to the extent of being arranged 
in categories. If we pursue this line, and turn our attention to treatments which 
are combinations of underlying elements, we are led into new territory. 


Factorial experiments 

38.74 If the treatments in an experiment consist of all possible combinations 
of a set of underlying factors, it is called a factorial experiment. Such experiments 
are formally the same as the (complete) cross-classifications discussed in detail in 
35.15-33 and 35.40-4 (although we did not use the terminology of experiments in 
earlier chapters because the analyses given there are also applicable to non-experimental 
situations). Each defining variable of the classification (i.e. the row-variable, column- 
variable, etc.) is called a factor in the experiment. Each value that a factor can take, 
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which defines an individual cell of the marginal classification by that factor, is called 
a level of the factor. Thus what we previously called а (r xc) cross-classification 
may be described in this context as a factorial experiment with two factors, one at r 
levels and the other at c levels; and a (r x c x I) cross-classification is a factorial experi- 
ment with three factors, at r, c and / levels respectively. More concisely, these two 
examples could be described as a (r x c) factorial and as a (r x c x Г) factorial respectively; 
if r = c, the former would be called a (>?) factorial, and so on. 


38.75 If a factorial experiment is carried out in a randomized blocks design, we 
naturally wish to subdivide the treatments SS (with (2—1) d.fr.) into component 
parts for main effects and interactions. ‘Thus, e.g., in a (r хс) factorial experiment 
with one observation per cell, we should have t = rc, and the (rc— 1) d.fr. for treat- 
ment differences are to be resolved into r—1, c—1 and (r—1)(c—1) as in Example 
35.3. More generally, the treatment differences SS are to be subdivided into com- 
ponents for all the main effects and the different-order interactions, as in Chapter 35. 
We call this subdivision an AV for treatments. 


It will be seen from (38.52) that the treatment differences SS is jTT- G*/t}, 


so that T'T/b now plays the same role, for a factorial experiment in randomized blocks, 
as y'y did in Chapter 35. ‘Thus an AV for treatments may be carried out upon the 
treatment totals T; by exactly the methods of Chapter 35, reading t as м. It is necessary 
only to remember to divide all the component SS by 5, this divisor arising, of course, 
because each Т; is the sum of b observations. 


38.76 From Exercise 38.6 it will be seen that the same simple AV for treatments 
in terms of the T; may be carried out for factorial experiments in Latin square designs, 
the divisor here being ¢ (instead of Б) which is again the number of observations of 
which T; is the sum. The same rule holds for the generalization of Exercise 38.6 to 
Graeco-Latin and higher-order orthogonal square designs. 


Confounding 

38.77 So far, the subdivision of the treatment differences SS has been simplicity 
itself, because in 38.75-6 we have considered only factorial experiments (themselves 
of simple structure) in the simplest block designs. We now remove both these limita- 
tions, and return to the consideration of experiments upon a set of related treatments 
in the general block design. 

A glance at the treatment differences SS in the general block experiment AV table 
(38.39) will show the reader that we cannot now expect the simplicity of 38.75-6 to 
persist, for the allocation of treatments to blocks vitally affects the treatment differences 
SS through B. "This remains true even for the BIB table (38.106), remembering the 
definition of Т, at (38.102). 

A little thought will convince the reader that this is inevitable, for if the treatments 
do not all appear in the same blocks, their effects must become entangled with those 
of the blocks themselves, and balance (in the BIB sense) is not enough to preclude this. 
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Indeed, a new problem now arises, for there is evidently a danger that important com- 
ponents of the treatment differences SS (e.g. main effects and first-order interactions) 
cannot be estimated because they are inextricably entangled with the block effects. 
They are then said to be confounded with blocks. 


38.78 Rather than confine ourselves to the SS in the AV table, we consider quite 
generally the estimation of р linear functions of the treatment parameters, say Ст, 
where C is a (px?) matrix of known coefficients. In particular, we are interested in 
contrasts in the parameters, where (cf. 35.58) the elements in each row of C sum to 
zero. Contrasts, it will be recalled, include simple differences, and also interactions, 
between the treatment parameters. 

Inspection of our model for block experiments at (38.12-14) shows that 

Ст = (С:0)6, (38.139) 
where 0 is a (px(b—1)) matrix of zeros which serves to annihilate the block para- 
meters a. 

Now in order that a vector Ly be unbiassed in estimating (38.139), it is necessary 

and sufficient by (19.19) that 


LX = (C0) 
ie. using (38.13), 
t 
| veles C 
К 
E (38.140) 
1,u; 
Ë =0 
щш; 


Thus, if we can find a matrix L, of order (р x bk), which satisfies (38.140), there will 
be no confounding of the p linear functions Cr. 


38.79 "Тһе equations (38.140) impose p(t+b— 1) conditions upon the pbk elements 
of L, and whatever р may be, these can certainly be satisfied if t+5—1<bk, i.e. if 
6>(t—1)/(k—1). The equations may also be satisfiable for some values of р if 
b«(t—1)/(k—1), for the conditions upon L are not necessarily independent ones— 
this depends on the structure of the block design. The case t = bk (where the ex- 
periment consists of a single replication of the set of treatments) falls into this category, 
for t/k<(t—1)/(k—1) for t>k, while t = k is trivial in this context. 

We therefore see that, if there are enough blocks, we can always, if we wish, avoid 
confounding any set of linear functions of the treatment parameters. This is intuitively 
obvious from the fact that if certain functions are confounded in a given set of blocks, 
we may deliberately add a further set of blocks in which they are not confounded. 
This is so in particular where each set of blocks is a replication of the treatments. In 
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the literature, functions confounded in part, but not all, of the experiment are said to 
be partially confounded. 


38.80 We have so far discussed confounding as though it were an evil to be avoided. 
It is undoubtedly a nuisance to be unable to estimate some functions of the treatment 
parameters, and even in the case of partial confounding there are computational com- 
plications which may be irksome, while naturally the precision of the estimators of the 
partially confounded functions must be reduced. 

However, confounding also has its positive aspect in factorial experiments. It 
will be remembered from 35.44 that the higher-order interactions are commonly 
found to be of little practical value. They are therefore often deliberately confounded 
with the blocks in an experiment, the consequence being that their SS and d.fr. appear 
as part of the Residual. Of course, we may carry out precisely the same merging 
process in an unconfounded analysis, as indicated at 35.44. The point here is that 
within a given framework of experiment (t, № and b), it may not be possible to estimate 
all the desired linear functions of the treatment parameters, namely the main effects 
and interactions. If some of these must be confounded, it is in general advantageous 
to start with the highest-order interactions and confound as few of the main effects 
and first-order interactions as possible. 

To this end, Fisher (1942) proved, using Abelian group theory, that in a factorial 
experiment with 27— 1 factors each at two levels (the 2*"~? factorial), no main effect 
or first-order interaction need be confounded in a single replication of the treatments, 
provided that k>2”, i.e. if block size exceeds the number of factors. Не later (Fisher 
(1945)) extended his treatment to factors with p” levels, where р is a prime (cf. also 
Mann (1949). Kempthorne (1952) gives a very detailed treatment of the subject, 
including factors with different numbers of levels. Cochran and Cox (1957) discuss 
the applications, with detailed plans of confounded block arrangements. Yates (1937) 
gives many examples, with applications in agricultural experimentation, while Davies 
et al. (1954) give examples in industrial experiments. 


38.81 One of the important applications of confounding is in fractionally replicated 
factorial experiments, where certain interactions are assumed to be zero in order that 
the remaining main effects and interactions may be estimated by using only some 
of the blocks of a confounded block design. The assumptions are essential to enable 
the analysis to distinguish between otherwise indistinguishable effects, which are called 
aliases. The theory is treated by Kempthorne (1952), and some discussion appears 
in Cochran and Cox (1957). Davies et al. (1954) discuss fractionally replicated designs 
in their application to industrial experiments, where they are often useful when large 
numbers of factors are to be tested. 


38.82 Many other confounded factorial designs are now available to the experi- 
menter. Split-plot designs confound main effects by assigning every level of a factor 
to the units in a block, an arrangement which is sometimes convenient and even 
necessary for the practical conduct of the experiment. Other, more elaborate, forms 
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of confounding in factorial experiments (quasi-Latin squares, plaid squares) are des- 
cribed and analysed in the specialized books to which we have already referred. 


Sequences of experiments: evolutionary operation 

38.83 The book by Quenouille (1953a) lays particular emphasis on the problems 
of planning and analysis of long-term experiments and groups of experiments (see 
also Cochran and Cox (1957) and Kempthorne (1952)). 

A different but related field which has been opened up recently is the use of asequence 
of experiments to find the optimum combination of factors, i.e. to maximize the yield 
(or some other quality) of the end-product of a process for fixed cost, or equivalently 
to minimize cost for a fixed yield. The method is to fit a response surface to the ex- 
perimental points by LS, and to move the area of experimentation along a path of 
steepest ascent until it is near a stationary point, which is then explored to investigate 
its character. Experiment designs with special properties of symmetry, called rotatable 
designs, have been developed for the exploration of response surfaces. For the theory 
of this method of evolutionary operation and the associated rotatable designs, the reader 
should consult Box and Wilson (1951), Box and Hunter (1957), Bose and Carter (1959), 
Gardiner et al. (1959), Bose and Draper (1959), Draper (1960a, b, c, 1961), and Box 
and Behnken (1960). Less theoretical expositions are given in the final chapter of 
Davies et al. (1954) and by Box (1957). 

Evolutionary operation methods are not properly sequential methods—they have 
no well-defined overall probabilistic properties, and have been criticized on this score— 
but there can be no doubt of their practical importance. 


Regression designs 

38.84 We end this chapter with an account of the design of experiments whose 
object is to carry out a regression analysis. 

As we mentioned in 38.2, Example 28.4 dealt with the problem of designing a 
simple linear regression experiment. We found there that we could minimize the 
sampling variances of the LS estimators of the two regression parameters by making 
half of the observations as far below the origin as possible, and the other half the same 
distance above the origin. We remarked there that this corresponds to the fact that 
a straight line is most efficiently “ fixed " by its end-points. 

Consider now the more general problem of allocating values to the regressor x 
when the expected value of y is a polynomial in x. We have treated the theory of 
polynomial regression in 28.16-20, taking the values of x for granted; now we ask 
how to choose values of x so that the parameters of the polynomial regression equation 
are optimally estimated. 


38.85 We assume that the values of x are to be allocated in a fixed interval, which 
without loss of generality may be called (—1, +1). The intuitive argument above 
extends to the polynomial case, for we know that a polynomial of degree k can be 
“ fixed ” by (k+1) points. Moreover, one still expects in this polynomial case that 
one of the points “ ought ” to be at each end of the interval. This intuitive argument 
is asymptotically valid for the general polynomial regression: Kiefer (1959) shows 


DESIGN OF EXPERIMENTS 159 


that at most (k+ 1) distinct values of x are required, of which at most (k— 1) are interior 
to the interval, provided that we ignore the analytical complications due to being 
an integer, which disappear as n—> оо. Characteristically, however, the exact theory, 
taking these complications into account, is not so simple—intuitive arguments cannot 
be expected to hold here. 


38.86 Now consider the choice of the (k+1) distinct observation points x, « X< 
... «хрз to minimize the generalized variance of the estimators of the parameters. 
In this polynomial case we have 


y = Х0+є 
where the matrix has the form 
ПРИ er Tota sd 4 
Low da xus dad marred 
Sewell ues 5 : Е 
о. н ; EE, 
Inn L, Me L, bares Ll, XE 
kii 
where n; observations are taken at the point x; 7 = 1, 2,...,k+1 and У м = п. 
i=l 
Thus the dispersion matrix of 6 is 
V(6) = o? (X' X) = o*(Z NZ) (38.142) 
where 
i eer a 
AI Cer T 
UE ee, d 
Z ect ie с (38.143) 
ЧЕ+л)х +1) 
aiaia... m 
апа 
т 0 
N = na 
(+I) x (eH) 27 “ балч) 
0 Mesa 


We therefore see that the effect of having (k+1) observation points and (k+1) para- 
meters is to make X’X a product of three square matrices. Hence 


LO = m -1XXI-0ZIINIIZI-INILZIS (38:145) 
s ELOM S z oe 
and if the generalized variance is to be minimized, (38.145) is to be maximized. We 
carry out this maximization in two stages. First, 


1+1 
|N| = Пл 
ii 
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is at once seen to be maximized for choice of the n; when they are all equal, whatever 
| Z|? may be. The latter alone remains to be maximized for choice of the x;. 


38.87 Now the reader may verify that 
kl 
2| = II — 
| Z| = (x) 


i<j 


so that 
121° = П (s -s)* (38.146) 


It is obvious at once from the form of (38.146), a product of squares, that it can 
always be increased by moving the extreme observation points to the ends of the interval. 
What is more, (38.146) when maximized will always be larger for a larger interval 
than for a smaller. ‘Thus we see that we should always observe over the largest possible 
interval and locate one observation point at each end of that interval, which we shall 
continue to refer to as (—1, +1). At each observation point, an equal number n/(k+ 1) 
of observations is to be made. 


38.88 We may now solve (38.146) ad hoc for the smaller values of k. For k = 1, 
the linear case, nothing further is needed, for the result of 38.87 has already confirmed 
Example 28.4. For the quadratic case, we have to locate хь, with x, = —1 and 
хз = +1, to maximize (38.146), which is now simply 

| Z|? = 41-92. 
This is maximized at x, = 0, and this must obviously be the other observation point, 
from consideration of symmetry. 

In the cubic case, we explicitly use the symmetry to reduce the problem to locating 
x, and x, = —ху. (38.146) is now 

| Z|? = 16x3(1—x)* 
which is maximized when xj = 4. 

In the quartic case, symmetry locates x, at zero, and we require only x, and x, = —x,. 

(38.146) becomes 
| Z|? = 164(1— 4), 
maximized when xj = $. 

These results, and the next two, which are as many as are needed in practice, are 
summarized in the following table: 

Degree of | Observation points in (—1, +1) at which 
polynomial, Ё | n/(k+1) observations should be made 
| 


+1 
+1,0 (38.147) 
| +1, +0-4472 
| +1, +0-6547, 0 
+1, 40-7651, +0-2852 
+1, +0-8302, +0-4689, 0 


сл шю ке 
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38.89 Hoel (1958) showed that the optimum observation points which maximize 
(38.146) are expressible in terms of the Legendre polynomials which are defined in 
30.37. Guest (1958) considered a different criterion of optimality, the minimization 
of the maximum variance of the fitted polynomial at any point in the interval, which 
led to the same optimum values; he showed that the optimum values are given explicitly 
by the zeros of the derivative of the kth order polynomial. The same criterion of 
optimality had been considered in a paper by K. Smith (1918), who first calculated 
the values (38.147). This was apparently the first design problem to be solved in 
detail, and it is all the more surprising that the paper was more or less forgotten for 
forty years. Smith’s paper also contains a series of charts comparing the variance 
of the fitted polynomial throughout the interval when the observations are made (a) 
by the optimum method; (b) by the method of uniform spacing of observations, which 
is better in the centre of the interval, but much worse at the extremes; and (c) by 
method (b) with an additional group of observations at each end of the interval, which 
removes the worst effects of purely uniform spacing. The advantage of method (c), 
of course, is that it does not presuppose any knowledge of k, and enables the experi- 
menter to investigate its value from the observations, whereas the optimum method 
of allocation cannot be used to investigate a higher value of k—this is precisely the 
point which we made in Example 28.4 for the linear case. It seems wise, in any case, 
to use the values in (38.147) corresponding to the highest value of k which the experi- 
menter would be willing to consider. 


K. Smith (1918) goes on to consider the effect of heteroscedasticity of errors on the 
optimum allocation. Hoel (1958) considers some special cases of correlated observations. 


38.90 Other criteria of optimality have also been used. Hoel and Levine (1964) 
and Levine (1966) consider the allocation of observations in polynomial regression to 
minimize the variance of the fitted polynomial at a specified point outside the interval 
of observation (—1, +1), and it transpires that this optimum allocation also minimizes 
the maximum variance over an interval (—1, x) if x is large enough. Gaylor and 
Sweeny (1965) consider minimizing the maximum variance, and an average variance, 
over any interval (arbitrarily related to the interval of observation) for the linear case 
only. H. A. David and Arens (1959) consider using two observation points in the 
linear case to minimize expected mean-square-error or maximum expected squared- 
error, the latter differing from minimizing maximum variance since the possibility 
is allowed that the true degree of the polynomial may be 2 rather than 1. A very 
general paper by Kiefer and Wolfowitz (1959), not confined to the polynomial case, 
uses game-theoretic results to compute optimum allocations using a number of criteria 
(see also the summary in Kiefer (1959)). Hoel (1965a) applies these methods to obtain 
optimum allocations for two-dimensional polynomial and trigonometric regressions. 
Hoel (1965b) finds the designs in univariate polynomial regression which minimize 
the variance of the fitted value for x “ extrapolated ” into an interval lying immediately 
between two intervals in which observations are made, and also obtains corresponding 
designs in bivariate polynomial regression, as well as for ordinary extrapolation in the 
bivariate case. 
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EXERCISES 


38.1 In 38.25, verify the identity of the two expressions (38.38) for the SS attributable 
to treatment differences in a block experiment. 


38.2 By generalizing the argument of 38.28, show that if we divide the treatment parameter 
estimators into groups and require zero correlation between members of different groups, each 
block must contain the same number of treatments from a given group, and thus the experiment 
may be resolved into a set of independent sub-experiments with smaller blocks. 

(Tocher, 1952) 


38.3 In 38.28, show that if a block experiment is designed to make the variance of each 
treatment parameter estimator equal to its value if there are no block effects, o*{diag(r)}~}, 
this requires that M = 0, and hence leads to the randomized blocks design with incidence 
matrix given by (38.46). 

(Tocher, 1952) 


38.4 Verify the simplified formulae (38.47-51) for randomized blocks designs. 


38.5 Verify the formulae in 38.32-3 for the LS analysis of blocks classified in a two-way 
table, and construct the AV table as at (38.39). 


38.6 From the general results of 38.33, show that the AV table for the Latin square design 
in 38.35 is: 


Source of variation 55 D.fr. 
"Treatment differences | T T/t- G*/* | t-1 
Rows | R'R/t-G*/r* | 1—1 
Columns | С С/С? t-1 
Residual | y y- (T T+R/R+C'O)/t | (t-1t—2) 

| 4268 
General mean Gd 1 
"Тоталі, | yy ГЫ 


Generalize this to Graeco-Latin and higher-order orthogonal square designs. 


38.7 In 38.39, show that no more than (t—1) Latin squares of order t can be mutually 
orthogonal. 


38.8 Verify that the inverse of (38.79) is 
14-1D-!'w, —1;D-!1, i 
Q- = D-1-D-!(w | 1j) --—i--2----||---|D-3/5, 
-w Dw 1c-lD-w/iw 
where A is the determinant of the 2 x 2 partitioned matrix above. Hence show through (38.23) 


that (38.85) holds for any D. 
(Tocher, 1952) 


38.9 Show that if D is determined so that every treatment difference is estimated with 
the same loss of efficiency (compared to the situation where there are no block effects to eliminate), 
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(38.85) leads to (38.95), the BIB design equation. If the loss of efficiency is zero, show that 


this reduces to the randomized blocks design (cf. Exercise 38.3). 
(Tocher, 1952) 


38.10 In a superposed complete set of orthogonal Latin squares of order T (e.g. (38.76) 
for Т = 4), each of the Т? cells of the arrangement has (T+1) “ references,” identifying its 
row, its column, and its position in each of the (T— 1) “alphabets ” forming the complete set, 
and each reference can take T distinct values. Show that if each reference in turn is used to 
allocate each of the cells to one of a set of T blocks, we obtain a BIB design with 

t=T, b=T(T+1), R=T, r=T+1, 4=1. 
Verify that in the case of (38.76), the resulting BIB design is: 


X: 3 TTA MEIST ee eS РЕ ВЕ KA M 
Block number 1|2|/3]|4 5|6 7|8 9 101 12/13/14 15 1617 18 19 20 


| 
Im 
| | | 

‘Treatments P 5|9|13| 1| 2| 3| 4 | 2| 3 4 11 2| 3| 4| 1| 2| 31,4 
in 216 |10/14| 5| 6| 7| 81 6| 5| 8| 7| 7| 8| 5| 6| 8| 7| 6] 5 
block \3|7|11|15| 9|10|11|12/11|12| 9|10|12|11|10| 9 10| 9/12/11 

|4 8112 16 |13 141516 16.115 114113114 13 |16 15 15 16113 14 
а= = | мр Se р as NES RE l 
Obtained from | Rows | Columns | Roman letters | Greek letters | Numerals 


38.11. In Exercise 38.10, show that if we augment each of the T blocks obtained from a 
particular reference by a further treatment, where a different such treatment is used for each 
distinct reference, and finally add a further single block containing all the (Т'+ 1) further treat- 
ments, we obtain a symmetric BIB design with t = b = T?+T +1, R=r=T+1,4=1. 
Verify this augmented design for the design derived from (38.76) in Exercise 38.10. 

By considering the case T = 6 (cf. 38.39) show that satisfaction of (38.93-5) is not sufficient 
for the existence of a BIB design. 


38.12 In Exercise 38.10, show that the dual design obtained by putting t = b,b =t, k =r 
and r = k has 2<1 and hence cannot be a BIB design. In Exercise 38.11, show that the 
dual of the augmented design derived from (38.76) is: 


— = e c ———— ae ae 
Block number | 1 | 2 | 3 4 s|e|7|s|9 10 11/12/13 14 15 16 17 18 19 20 121 


‘Treatments 11 1| 11 11 2| 2/21 2| 3| 3| 3| 31 4| 4| 4| 4) 1| 5| 9/13/17 
in | 5| 6] 7| 8] 5|.6| 7|.8| 51 6| 7| 8) 5|. 6| 7| 8| 2| 6110114118 
block 9/10 11.12 10 9 12 11 11 12 9 10 12 11 10 9 3 7 11 15 19 
8 
1 


13/14 15 16 15 16 13 14 16 15 14 13 14 13 16 15 4 
17 18 |19 20 20 19 18 17,18 17 |20 19 19 20 17 18 21 2 


Verify that this їз а BIB design. 


38.13 Show from (38.93) that for any BIB design, 
r/(t-1) = 60-2)/(—B, 
and with t = ck, b = cr as above (38.98), use (38.93-4) to show that 
r(c-1) r-A_ b-r 
Miedo E tat 
where I is a positive integer. Hence establish (38.98). 


= 7-00 = I, 


(У. М. Murty, 1961) 
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38.14 Using (38.45-6), show that for randomized blocks the estimator (38.118) is identical 
with the intra-block estimator (38.29), both reducing to (38.48), so that use of the mixed model 
does not change the estimator in this case, Show that the dispersion matrix of the estimators, 


2 
(38.112), is а, +р1 11), differing from the intra-block result ((38.40) and (38.47), the variances 


in particular being increased. Show that if we measure from the general mean of the sample, 
the distribution of is identical in the two models. 


38.15 In 38.57, use (38.1) to show that 
E(B) = n't 
and 
У(В) = k(1 +kp)o? Ib, 
so that we may estimate the treatment parameters unbiassedly from the block totals by 
#в = (пп/)-!пВ, 
with 
У(@в) = k(1+kp)o? (nn?) 1, 
provided that nn’ is non-singular, which implies 52 t. 

Show that if G is used as origin (effectively removing the general mean from the treatment 
parameters) Ês is uncorrelated with 4 defined at (38.29), and hence use the generalized LS 
method of 19,17 (Vol. 2) and the dispersion matrix (38,40) to show that the linear combination 
of them with smallest variances is 


fy = [sos ()— (тг p + eon" {т = [cn T D 


agreeing with (38.29) as p—> co. 
(cf. Tocher, 1952) 


38.16 Show that €; of Exercise 38.15 coincides with the MV estimator € at (38.118) if and 
only if r£ = С. Using (38.31) (where € was defined by (38.24)) show that this relation always 
holds. 


38.17 In a BIB design, the observations within each block are replaced by their ranks 
1,2,..., А. If Ti is the total of the ranks thus allotted to the ith treatment, and 


D 
S = X (Nh-T9 = ETI-tr(- 1), 
i=1 
with maximum value Smax = 22 (22 — 1)/12, show by extending 37.40-1 and Exercise 37.14 that 
W = Si/Smax has 


_ (k+1) 
E(W) = +1) 
_ 2(k+1)* 5-1 
uis 2 rt(t+1)* ( нЕ | 
exactly, and that, approximately, 
A(t+1) 
M =) w/0-W) 


has the variance-ratio distribution with d.fr. 
э =1-1-- 
А Jj 


э = (7—1), 
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reducing to the results of Exercise 37.14 when А = t,4 = r, and the BIB becomes a randomized 


blocks design. 
(Durbin, 1951) 


38.18 For any BIB design resolvable into r replicates of the ¢ treatments, show that the 
number of treatments common to two blocks in different replicates has mean equal to k?/t and 
sum of squares about this mean proportional to (b—t—r+1). Hence show that (38.98) holds, 
and that if and only if the equality holds in (38.98), there are exactly the same number (i.e. 
k?/t) of treatments common to any two blocks in different replicates. 

(Bose, 1942) 


38.19 Verify the observation points given for k = 5 and k = 6 in the table (38.147). 
(К. Smith, 1918) 


38.20 If t is odd, p is its smallest prime factor, and k<p, show that a BIB may always be 


constructed by the following equal-differences method. Label the treatments 0, 1, 2,..., #—1 
and construct }(t—1) initial blocks containing the treatments [0, 1, 2, . . . , &—1]; [0, 2, d. na 
2(k-1); .. .; [0, 202—1), t—1, ..., 08 —1)(—1)), where every number in the blocks is 


calculated as a residue (mod t). From each initial block form a new block by adding to each 
of its treatments an integer r; this is done for every such integer 1<r<t—1. We thus obtain 
a set of t blocks (one for each member of the residue class mod 1) for each initial block. 
Show that the resulting BIB has b = 402—1), = 2А(2—1) and A = 1A(k—1). 
(Gassner, 1965) 


СНАРТЕК 39 
SAMPLE SURVEY THEORY: DESIGNS 


The estimation of means of finite populations 

39.1 At the beginning of our discussion of random sampling in 9.1-4, Vol. 1, we 
pointed out that the application of probability theory to a sampling procedure requires 
only that it should specify the chances of selection of all possible samples; it is by no 
means necessary that the procedure should be simple random sampling, which gives 
every possible sample from the population the same chance of selection. We went on 
to distinguish between sampling without replacement from a finite population, which 
necessarily involves dependence between successively selected members of the popula- 
tion, and sampling with replacement, which removes that dependence. 

For almost the whole of these three volumes, we have been (and shall be) concerned 
only with simple random sampling. In this and the following chapter, however, we 
are specifically interested in other forms of random sampling. "These arise, in practice, 
when there is a problem of sampling a finite population, i.e., a population with a finite 
number (which we shall always denote by N) of members. Even in sampling with 
replacement, we now have a new feature arising from the finiteness of the population: 
every individual in the population is recognizable, so that if a value appears twice in the 
sample, we can know whether it is the same individual appearing twice, or two different 
individuals with the same value. 

The direct interest of sample survey theory, as this branch of the subject has come to 
be called, is almost entirely in the estimation of means (or, equivalently, totals) of the 
variables being studied. "Тһе theory which we shall study is nevertheless more general 
than this, since the mean of any function of a variable (e.g. of its square) may be treated 
by the same method. Ме shall study the estimation of variances and other constants 
of the population only in so far as this is necessary to throw light upon our central 
concern, the estimation of means. Results for proportions are always derivable from 
those for means by specializing the variable to take values 0 or 1 only. 

We are thus entering a rather narrow area of statistical theory, but it is an area 
which has been intensively cultivated, and this on grounds of its practical importance 
rather than of its mathematical attractiveness. The large journal literature has in 
recent years been summarized and supplemented by several books, notably those of 
Cochran (1963), Hansen et al. (1953) and Yates (1960). The last-mentioned book 
contains extensive bibliographies of the theoretical and applied work on survey sampling 
which have been brought up to date in new editions since its first publication in 1949. 
М. N. Murthy (1963) reviews recent theoretical developments. 

We shall have to confine our discussion to the theoretical aspects of sample surveys, 
and our aim will be to display them in the context of genera! statistical theory. We 
cannot hope that all the results of importance will be elegantly displayed, but we shall 
try to minimize the inherent cumbrousness of the subject. 

166 
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Random sampling with equal probabilities without replacement 

39.2 We wish to estimate the mean jv) of a variable у in a finite population with 
N members, from a sample of n members drawn at random without replacement, 
using equal probabilities of selection. It is perhaps not quite obvious that (in the 
absence of any knowledge of the form of the population) the MV estimator of и is 
the sample mean т. We now use the Least Squares theory of Chapter 19 to demon- 
strate this. 

Consider the model 

y= In +e (39.1) 
(nx1) (nx 1)1x1) (nx1) 

where 1 is a vector of units. "The errors e; are the deviations of the observations from 
the population mean, and they are not uncorrelated, because the drawings are not 
independent. By the symmetry of the situation, the covariance, say риз, between any 
pair of observations in the sample is the same. It is this symmetry which leads us to 
expect the sample mean to be the best estimator of и. The dispersion matrix of the 
errors is 


Mp On 

pu te . 
Ve = wal . AL e ELEM (39.2) 

Е Зір 

о Мә, эж 1 
By 19.17 and Exercise 19.5, the MV unbiassed linear estimator of и is the LS estimator 
й = (7-1) у-у (39.3) 

with variance 

V) = n, УЧ)", (39.4) 


To use (39.3-4), we must evaluate V-?. As may be verified by multiplication with V, 
this is 


(1*(n-2)) р... —p 
2де а еда 
rc, 17.7.7. ux 
=p. si =p {1+(n—2)p} 


Hence 
1' V3 = {1+(n—1)p} “1, 
1'V-11 = л{1+(л—1)р}-1 


(*) For convenience, we write и (without suffix) for the mean и; of the population in this 
and the next chapter only. The suffixed moments и, will denote the higher central moments 


as usual. The corresponding sample moments are written (again in these chapters only) m and 
mr, r22. 
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and (39.3-4) becomes 
й = т, (39.6) 


V(f) = 501+ (и 1)р). (39.7) 


So far, we have not had to evaluate the correlation, р. We have delayed this deliber- 
ately, since the whole of this section remains valid whatever p may be. In our present 
application, we may easily evaluate p by noting that when n = N, V(ja) reduces to zero, 
since the whole population is sampled. Thus, from (39.7), p = — ят when п = N. 
But clearly, the correlation between a pair of sample values is independent of sample 
size. Thus, quite generally for random sampling without replacement, 


1 
p Nei (39.8) 
and (39.7) becomes 
_ Ba N-n 
VA) = 7 жен (39.9) 


The moments of the sample mean and their unbiassed estimators 

39.3 In fact, (39.9) and also the third and fourth central moments of the distribution 
of m have already been derived in Example 12.8 using combinatorial techniques. We 
there found it algebraically convenient to work in terms of sample and population k- 
statistics, k, and K,, and throughout this chapter we shall do the same. In fact, we shall 
redefine the population “ variance " as 


N 
2 = = —— 
lata Noa fa 
and the sample “ variance” as 
n 
Sk ac Ma; 


we retain the name “ second moment " for u, and m,. With this notation, we rewrite 
the results of Example 12.8 (which include (39.9)) as(*) 


E(m) — u, 
V(m) — “(= 
E(m—)* = K, {(5- ж) -xG- wh (39.10) 


Е(т-и)* = «(2-5 а-н) 


Му ҖЫ OH) Ga) HEA) 


(*) We now discard the suffix N to the expectation operator Е; it is always present by implica- 
tion in this chapter and the next. 
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We recall from (12.109) that 
E(k,) = Ky, (39.11) 


so that we may substitute s? for о? and k, for K, in (39.10) to obtain the unbiassed 
estimators of the central moments of m, 


Pm) = (7x) 
econ (0) -06-0) 


The unbiassed estimator of the fourth moment is slightly more complicated, since it 
is not linear in the K, We require an unbiassed estimator of ot = К2, Exercise 
12.11 gives at once the result 
0-1) (Nn-n-N-1) 
elr- DNN) "| _ xe 
2(N—n) = 
(n—1)(N+1) 


(39.12) 


which reduces to 
n—1\ (N+1) , (N—n)(Nn—n—N-1),) _ ge 
51) (Pais n(n+1) N (N-1) FL EET! пы 
Substitution of the random variable in braces in (39.13) for о“, and А, for K, in the last 
equation of (39.10) gives an unbiassed estimator of the fourth moment of m. 


39.4 There is nothing to prevent us, in any given case, from estimating the first 
four moments of m as indicated in 39.3, and then fitting a Pearson distribution to 
obtain an estimate of the sampling distribution; the situation here is absolutely analogous 
to that of Maximum Likelihood estimation in 18.20, Vol. 2. However, here as there 
the process of fitting a small-sample distribution by moments is rarely carried out. 
This is less due to laziness than to the fortunate fact that the Central Limit theorem 
makes the labour unnecessary for the sample sizes encountered in practice. Although 
we shall not prove the limiting normality of m, there is a new point worth considering 
in connexion with the nature of the limiting process. We cannot simply let tend to 
infinity, since it cannot exceed N. Thus Madow (1948), who established a Central 
Limit result in this case, allowed both and N to increase, subject only to n/N remaining 
bounded away from 1. It is easy to verify from (39.10) that the skewness and kurtosis 
coefficients of m tend to the normal values under this limiting process. Hajek (1960) 
gives a necessary and sufficient condition for the limiting normality of m. 

We may thus apply the standard error techniques, as described in 9.26-9, Vol. 1, 
to the distribution of m, and carry out tests of hypothetical values д, of the population 
mean, or set confidence limits for и, in the ordinary way. It is only necessary that z 
be large enough. If n/N is small, we may effectively proceed as for simple random 
sampling. 
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Sufficiency in sample survey theory: individuals recognizable 

39.5 In 17.31, Vol. 2, we developed the concept of sufficiency. А sufficient 
statistic yields all the information in the sample concerning the parameter. Now, in 
sample survey theory, we do not specify the form of the population distribution, but 
following Basu (1958) and Pathak (1964c) we may take as parameter the N-dimensional 
vector of the values of the variable y in the population. Any sample of observations 
(whether selected with or without replacement) will then yield information as to л 
or fewer of the N elements of the parameter-vector. Any two samples (selected by the 
same sampling scheme) which contain the same set of d(<m) population members 
yield the same information about the parameter, irrespective of whether these d members 
appear with different frequencies in the two samples; they are therefore called equivalent 
samples. The set of all possible samples obtainable by a given sampling scheme can 
be partitioned into subsets in many ways. If each subset of a partition consists of 
equivalent samples, that partition is called sufficient. If some of the subsets of a sufficient 
partition can be merged and another sufficient partition thus formed, the latter is called 
a smaller sufficient partition. If there were a sufficient partition to which every other 
sufficient partition could be reduced by such a merging process, it would be a minimal 
sufficient partition (cf. 23.16, Vol. 2). 

If we define any (vector) statistic (i.e. any function of the sample values of y) t, 
we induce a partition of the set of all possible samples, each subset consisting of all 
samples with a particular value of t, If the partition thus induced by ¢ is sufficient, t 
itself is a sufficient statistic, since its value characterizes subsets of equivalent samples. 
The conditional distribution of any other statistic, given ¢, will tell us nothing further 
about (i.e. be free of) the parameter. 

'The general theory of sufficient statistics in Chapters 17 and 23 now applies. In 
particular, the Rao-Blackwell result of 17.35 states that, given any unbiassed estimator 
of a function of the parameter, we can improve upon it by using instead its conditional 
expectation given a sufficient statistic. 

Simple as this result is, it has some unexpected consequences. Consider the simplest 
case of random sampling with replacement from a finite population of N members, to 
estimate the population mean. ‘The intuitive estimator is the sample mean, т. How- 
ever, this сап be improved upon, for in general the n sample members will include some 
individuals selected more than once, and, as we have seen, it is the d<n distinct indi- 
viduals in the sample which form the basis for equivalent samples. Thus the vector 
statistic t, consisting of the values y™, y, , , . , y attached to these distinct individuals 
in the sample, is a sufficient statistic. "The conditional expectation of m given t is 
evidently the mean of the distinct individuals in the sample, say ma, and this will have 
smaller variance than т. Raj and Khamis (1958) give the reduction in variance 
explicitly—see Exercise 39.1. There, as quite often, sample survey estimators im- 
proved by the Rao-Blackwell method have rather complicated variances to evaluate. 

As a general rule, in sampling with replacement, it will always improve precision 
of estimation if only the distinct selected individuals are used in any estimation process, 
rather than all the individuals selected. The variance of the latter estimator is, however, 
usually easier to estimate, and is always an upper bound to that of the former estimator, 
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so may be used conservatively in place of it. Unless n/N is large, the reduction in 
variance will rarely be sufficiently large to justify the extra labour required to ascertain 
its extent. 

Pathak (1961a, b, 1962a, b, 1964a, b, c) has carried out a series of investigations 
of the application of the Rao-Blackwell method to various problems in sample survey 
theory. See also J. N. K. Rao (1966). 


Sampling without replacement with unequal probabilities 
39.6 We now generalize random sampling without replacement by allowing the 
probabilities of selection to differ between individuals and from drawing to drawing. 
Let (ори be the probability that the ith individual (with value y;) is selected at the rth 
х 


drawing, = ур, = 1; i ranges from 1 to N апа r from 1 to л. Now let 
i=l 


m = a ҮЛ (39.14) 


лу = EX өргөр» (39.15) 
ree 
the later probability on the right of (39.15) being taken as conditional upon the 
earlier event being realized. From their definitions, л; is the overall probability that 
yı is selected for the sample of size л, and л; is the joint probability that both у, 
and у; (i#j) are selected for the sample. Clearly, the complete set of уру, which we 
call the selection scheme, determines the лү; and л, ; but the same set of лү; and л; may be 
associated with different selection schemes. It is usually a difficult matter to find 
values of the „р; to achieve desired values of л; and лу, but in selection schemes such 
as those in Exercises 39.5-6 the connecting equations can be solved numerically. 
Fellegi (1963) gives a recursive method of making the (jp, equal for every drawing 
—see also Brewer (1967). 
Equal-probabilities sampling without replacement, which we have already con- 
sidered, has 


In general the reader may verify from (39.14-15) that 
N N N 


Em =n; Vay (n-lyn ji Tay m(n-1l) (39.16) 
j i,j=1 


i=1 j=1 


We now follow the notational convention that sample observations are labelled 
Xp Yn +++» Yn in the order in which they are drawn. This will not generally coincide 
with the order of labelling of the population уу, ys, . .., Ух: This means that, e.g., 
ys in the sample is not y, in the population, but in the interests of simplicity we 
shall retain this notation when we are considering symmetric functions of the sample 
values. 


In taking expectations, we consider the sample as a whole, and take mm = È у 
i-i 
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N 
n 


as the random variable. There are possible samples, and we suppose these 


to be listed in some order and labelled S,, S,,..., Sp.. Sey: By definition, 


ө = 5p) Bess) 


The e ) sets of л values in the summations may be reassembled into N sets of n se: t 
n Є: 


values, corresponding to the Se 21) ways in which each of the N individuals in the 


population can enter the sample. Thus 
'N-1 


nE(n) = in Prob бо], m Em mo (89.17) 


say. Thus, as we should expect, the sample mean is not generally unbiassed for the 
population mean. We also have 


n n 2 
n? V(m) = v(£ ») = (2) )-= 
= z( X x)« b» э») =T. 
=1 d. 


x 
The first expectation is evaluated exactly as at (39.17), and is equal to E л;у?. In 
i=1 


the second expectation, there are ( ) sets of n(n— 1) values, which we reassemble in 
n 


N(N —1) sets of m | values, corresponding to the (es =) ways in which each 


of the N(N—1) pairs of individuals in the population can enter the sample. Thus 


8 (2) ^ x [Gz3) 
E| XX yy,| = У Prob {S,} EX „= ХХ| X Prob {S,}] os yy 
i, 1 r-1 ij=1 ij=1 
i=j Г] 


r=1 


N 


= XX луу, 
$j-1 
is*j 


and therefore 
п x х 
Е ») -atV(m)- X omit EX omyyoy-it 
i=1 i=1 $j-21 
[22] 


N N 
= 2 z,(1-2)9t eo (т,—л{л,)у,‹у,. (39.18) 


еј 
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39.7 The expectations which we evaluated to obtain (39.17-18) are special instances 
of the general results, for any function g of the observations 


E($ уд} = È ауд, 


» N (39.19) 
x zx oy} = УУ л;#(уь у) 
54-1 ij=1 
[2] {#1 


which require no further proof, since the argument which we used remains unchanged 
when у; and y;y; are replaced by arbitrary functions. 

We may now obtain an unbiassed linear estimator of и. If the same weight tv, 
is to be attached to an individual value y; in the population whenever it is selected, 


we must have 
DE 5 inn) = 


and hence, by (39.19) with au = шу, 
1 
X ллу, = р= > x Vie 
This must hold identically in и, so we equate n of y; and find w; = 1/(Nz;) 
and thus 


4 1 РД 
X E 
(ues (39.20) 


is the only unbiassed estimator of this form. Further, the variance of / is simply 
(39.18) with йе replacing у. Thus 


N тей 
муф) =, i! Ute х umm) (39.21) 


39.8 We can obtain an unbiassed estimator of (39.21) by use of (39.19). By 
inspection, such an estimator is 


NP) = È о-и Xx (numm), (39.22) 


à i, sia} Mi Nj Mij 


proposed by Horvitz and Thompson (1952), who first formulated the problem as 
in 39.6. On the other hand, we may use (39.16) to write (39.21) in the identical form 


4 > TEINE 
N*V(4) = 4 EX (n,3,-24)| ———)- (39.23) 
S ov Ni T, 
Thus, by (39.19) a second unbiassed estimator of (39.21) is 
NH.) = 4; xx (= E (2 : -3y : (39.24) 
7 Ju T, 


We 


which was proposed by Yates and Grundy (1953). 
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If we could choose each л; proportional to y;, the variance (39.23) and its estimator 
(39.24) would be identically zero. (Exercise 39.3 shows that this is not necessarily so 
for the alternative estimator of variance (39.22).) In practice, we can at best approximate 
to this situation if we have knowledge of a variable highly correlated with y. Fellegi 
(1963) and J. N. К. Rao (1963) discuss methods of achieving desired values of the л; and 
their effects on the sampling variance of the estimator. See also Hanurav (1967). 

Hájek (1964) investigates the asymptotic normality, variance and estimated variance 
of (39.20) for rejective sampling, which is sampling with unequal probabilities with replace- 
ment in which the whole sample is rejected as soon as any individual is selected a second 
time. 

It is not at first clear which of these alternative estimators of sampling variance is 
preferable. It is easily shown that the estimator (39.22) can take negative values 
(cf. Exercise 39.3), but (39.24) can also take negative values (cf. Exercise 39.4)—it will 
be non-negative definite only for schemes with every л;л;—л;;>0. It is nevertheless 
true that in two selection schemes used in practice, (39.24) is never negative—Exercises 
39.5-6 give details—and this is not so for (39.22). ‘There seems no doubt that (39.24) 
is generally preferable to (39.22), since the latter is, so to say, more likely to take negative 
values, and for an estimator of variance (with necessarily positive expectation) this 
implies a larger sampling variance. 

It seems likely that there is no estimator of the sampling variance (39.21) which is 
non-negative definite for all selection schemes, but so far as we know this has not been 
demonstrated. 


39.9 However, by adopting a different linear estimator from (39.20), we can put 
ourselves in the position of always having a non-negative estimator of the sampling 
variance of our estimator. In order to do this, we must no longer confine ourselves 
n 

to linear functions of the form У w; y; discussed when defining Á at (39.20), for we saw 
i=1 

there that / is the unique such linear function which is unbiassed. 

In constructing linear estimators of the population mean from a set of sample 
values, the coefficients attached to each sample value may be made to depend on: 


(a) the individual population member whose selection for the sample yields that 
value; 

(b) the drawing at which the sample value was selected; 

(c) the whole set of population members selected for the sample, rather than the 
individual member as in (a). 


The coefficients may depend on any one, two, or all three of (a), (b), (c), and there 
are thus seven general classes of linear estimator, the last class including all the others. 
They are discussed and analysed by Godambe (1955, 1965) and by Koop (1963), who 
show that we cannot find a minimum variance unbiassed estimator in this most general 
class, a proposition which is intuitively plausible from the consideration that, given 
knowledge of the population values, we can always construct a linear unbiassed estimator 
with zero variance, but that this estimator must clearly depend on the population 
values themselves. Ап example of such an estimator is (39.20), as we see at Exercise 
39.3. Essentially, the absence of an optimum linear estimator here is due to the fact 
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that the individuals in the population are recognizable and the probabilities of selection 


are at choice. 
We have so far considered only coefficients depending on (a) alone. We now con- 
sider an estimator whose coefficients depend on all three of (a), (b), (c). 


39.10 Let y, now denote the individual selected at the rth drawing. Let py) 
be the conditional probability of its selection at that drawing, given that it has not been 
selected previously, and define 


ч—1 
а= УФ» = Х Уш+Уш/Фы» u= 2,3,..+5m (39.25) 
Each of the zz, is unbiassed for Np, since 
N 
E(u) = Ey 


i=l 


by (39.19), and for u>2, 
u—1 у, 
E(z,) = ok ror (nt Toye" э) 
rel Риу 


u—1l u—l 
= [E yo+ Nu- E yh] = Е{Ми} = Nu 
Thus any linear function 
eS eee = (39.26) 
ual wel 
will be unbiassed for Nu. The most symmetrical such function is the mean 
" 1 LA 1 n Уш) nu-i 
ia. = HELE ES yy) 39. 
NS БЕРЕС (9:22) 
(89.27) does not reduce to Nm when probabilities of selection are equal—cf. Exercise 
39.10. The variance of 3 is 


V(à) = ER zem cov (Zw 2: (39.28) 
ufo 


By evaluating the covariance of z, and z, in two stages, the first for z, fixed, and the 
second allowing z, to vary, it follows at once that cov (Zu %) = 0, ито. Hence 

E(z,z,) = Elzu) Elz.) = №". (39.29) 
Thus we only have to evaluate the variances in (39.28). In general, var х, is cumber- 
some to evaluate (see Exercise 39.9), but it is easy to estimate V(), for 

V(z) = Е(2°)— № зи? 

so that, using (39.29), ап unbiassed estimator of V(Z) is 2 — 2,2, for any ито. If 
we average this estimator over all 3n(n— 1) distinct pairs u,v, we obtain the estimator 


1 
J= p- 
OG) = P-— УЕ ee (39.30) 
which is identical with 
ра) = E (0-220. (39.31) 


n(n—1) а 
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2 
(39.31) is of the form =, where s? is the second k-statistic of the observed z,. This 


approach, due to Raj (1956), therefore reduces the problem of sampling the y; (with 
unequal probabilities and without replacement) to sampling the x, and calculating their 
mean and estimating its sampling variance exactly as though they were sampled with 
equal probabilities and with replacement. This is a special case of a general result 
given as Exercise 39.12. 


39.11 We have to decide which of the estimators (39.20) and (39.27) to use. Little 
is known in general of their relative efficiencies, but Raj (1956) reports two sampling 
experiments with п = 2 which favour (39.27) fairly strongly. 

However, an estimator with variance never greater than (39.27) can be obtained by 
applying the Rao-Blackwell method of improving an estimator—see 39.5. We cal- 
culate (39.27) (which we now relabel %,) to emphasize its dependence on the order of 
drawing the sample) for each of the п! possible orders in which the observed sample 
could have been drawn, and then average these л! values using the relative probabilities 
of the л! orderings as weights, thus obtaining an improved estimator 2. The esti- 
mator of variance at (39.31) can be treated in exactly the same way to obtain an improved 
estimator of V(%,)). These results, due to M. N. Murthy (1957), are direct con- 
sequences of the Rao-Blackwell result of 17.35 and are given in Exercises 39.30-1. 
The improved estimator 2, has not been shown to possess a non-negative estimator of 
variance for all selection schemes, as 2; has, but Murthy showed that this is so for 
п = 2. Moreover, in Raj's (1956) sampling experiments already mentioned, M. N. 
Murthy (1957) confirmed that 2, is more efficient than Są) and that an unbiassed esti- 
mator of its sampling variance fluctuates less than (39.31) does for %,). Finally, the 
improved estimator of V(%,)) also achieved worthwhile gains in efficiency over (39.31). 

There are thus strong theoretical reasons for using the improved estimator ,. 
However, computational problems become formidable as л increases, since л! different 
sample orderings have to be considered, and even à, as originally defined at (39.27) 
requires the computation of z conditional probabilities, which may require considerable 
labour when population size № and sample size n are large. When the selection scheme 
is simplified, as in Exercise 39.31," by keeping probabilities proportional at all drawings, 
£, takes a simpler form, but still seems formidable to compute. 


Unequal probabilities in sampling with replacement 

39.12 When sampling is with replacement, drawings are independent, and we now 
have the possibility that any y; is selected more than once. We define л; and луу by 
(39.14-15) as before, and now require the further definition 


л = БЕ oi wbi 
Ts 
лу; is now the probability that y; and y; are selected once or more for the sample, and 
ли the probability that y; is selected at least twice. (39.16) is replaced by 
N x NN 
У л=п; У лу = (п-1)л; E Xm,-n(n-1) (39.32) 
ja i=1j=1 


i-i 
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which holds for any 7, j without restriction. Similarly (39.19) is replaced by 
n N 
Е[® £02) =, = b 780) 


{25 ао) = ЖЕ яль) 
izj 
i and j being unrestricted in the final double summation in (39.33). In (39.18) and 
(39.21) the л, may now have its suffixes equal, but the term 2,2; in the double summa- 
tion must still have different suffixes, as the reader should verify. If we now use 
(39.32) instead of (39.16), we find that (39.23) holds without change. 
In the particular case when the probabilities of selection are the same at each draw- 
ing, so that we may write them р; without a prefix, the theory simplifies, for we now 
have л; = np; and л, = n(n—1)p;p;, and the estimator (39.20) becomes 


(39.33) 


А ЕА 
SE f 
PT NR imi Pe (999 
while (39.23) becomes 
у, 2 
N: V(Á) = A EX y G =) Я (39.35) 
Pi Pi 
and using (39.33), the unbiassed win "й бл 35) is 
N:P (á z (2: 2. 
К = zs (A 
Using (2.27), Vol. 1, this is 


Xe d RO _ Sup 39.36 
“ише te E E (2 na) =" mee (3936) 
where s? is the second sample A-statistic, defined as in 39.3, of y;/p;. The simplicity 
of the result (39.36) springs from the fact that the y;/(Np;) are unbiassed estimators 
of u, uncorrelated because sampling is with replacement, so that Exercise 39.12 applies 
here. 


Sample designs: stratification 

39.13 We saw in 39.8 that we cannot hope in practice to find a selection scheme 
which will reduce the sampling variance (39.23) to its attainable minimum of zero. 
As indicated there, however, we may have available some general (though possibly 
rather imprecise) information about the variable y, or some other variables correlated 
with it, which enables us to improve considerably on simple random sampling. The 
aim of a sample design (i.e. a choice of the л;; and hence the лу) is to reduce estimation 
variance as much as possible. (Later, we shall modify this statement to take into account 
the varying costs of different sampling procedures.) If, as usual, we are dealing with 
several variables simultaneously, we can at best find some compromise set of л, which 
will be effective in producing small variances for all the estimators we shall use. It 
is this need for compromise which lends point to the consideration of various classes 
of selection schemes which have the aim of variance-reduction in mind. 
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39.14 First, consider the form of (39.23) when the л; are fixed and only the л, 
are at choice (within the system of constraints imposed by (39.16)). (39.23) is a sum 
of 3N(N—1) terms, each of which is a coefficient (z;7;—25;) times a non-negative 

NR 
quantity (2 t -2) which is now fixed by the values of the лу. If the values y; are 
i T 
unknown, it is not possible to say which pattern of лу will be optimum, as we have 
already seen, but irrespective of the values of the y;, we see that whenever we fix a 
луу equal to л;лу, the coefficient will be zero. Now, we saw in 39.6 that in ordinary 
(б 5 n(n-1) 
№ "9 = N(N-1) 


>0, and every one of the }N(N—1) terms 


equal-probabilities sampling without replacement, we have л, = 


n(N—n) 
N*N-1) 
in (39.23) makes a positive contribution to the variance in this case. We are now in 
a position to see that if we put every л; = n/N and some лу = лулу = n*/N*, we are 
bound to reduce the contribution to the variance of our estimator from those pairs 
i, j. But because of he 16) the лу sum to n(n— 1), and the slight increase in some 


for all i # j, whence лұл;-лу = 


луу from NN por to =, will be offset by a decrease in other ;;, increasing the corres- 


ponding Bises Sh to the variance. However, if these latter reduced луу are associ- 
ated with smaller values of | y;— y; |, while the increased лу, are associated with larger 
values of | y;—y;|, we should expect a net overall reduction in sampling variance. 
Moreover, since the increased л, are only slightly increased, the compensating reduction 
in the other л, need only be small, especially if there are no fewer reduced than increased 


луу. 
We thus have arrived at a rather imprecise principle for "improving sampling variance 
with the л; all equal: increase the л; from NO Deos to 4-5 Wherever | y;— у, | is large, 


and decrease the z;; wherever | y;—y;| is small. It oi make our principle clearer 
if we realize that when лү, = ллу, y; and у, must be selected by independent processes. 
Thus we are investigating a procedure in which the selection scheme is broken up into 
two or more sub-schemes, operated quite independently of each other. Reverting 
to the notation of 39.6, we may express this in terms of the selection scheme as follows: 


> 0 for 1<г<т 
For 1<i<N, a RES 
ы „> n otherwise; 
Б С for ny «r&n 4n. 
for NQ«i«N GN. q a 
1 1 » (Pi 0 otherwise; 


(39.37) 


1-1 
2-1 
Tor X N,<i<N, on? 0 dnm nı<r<n, 
i21 = 0 otherwise. 
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In (39.37), the N population members are split into k groups containing N, members, 
k 


2 N, = N: the sample is similarly split into k corresponding samples of sizes m, 
=1 


k 
У т = п. Each sub-population is independently sampled. 
-1 


Our principle now tells us to choose the y; as members of the groups so that the 
members of different groups are as different as possible (for the zero coefficients in 
(39.23) will come from pairs in different groups) while the members of any one group 
are as alike as possible (for these are the pairs whose лү, will be decreased). 

Population sub-groups, each of which is to be sampled independently and the 
results combined to estimate overall population parameters, are called strata, the groups 
thus being identified metaphorically with geological layers in the population. For 
the detailed study of stratified sampling, as our selection scheme (39.37) is called, we 
shall find it convenient to start afresh with a more direct approach. 


Stratified random sampling 
39.15 Аз in 39.14, we suppose that the population has been divided into А strata, 
k 


the /th of these containing №, individuals, 2 N, = №. We now specialize the general 
-1 
scheme (39.37) by supposing that independently within each stratum the sample of 
k 
n, members, X s, = п, is selected with equal probabilities without replacement. 


1-1 
Such а scheme is called stratified random sampling without replacement. Because of 
the independence of the selections in the different strata, the theory is very straight- 
forward. We now denote a member of the Ith stratum by y;;, and will always reserve 
the first suffix / for stratum identification. 


Clearly, we have z; — 7 for all iin the Аһ stratum. "Thus the unbiassed estimator 


М 
(39.20) Ьесотез 


fra Noe SL ды. (39.38) 
Nicum/N Ni i 


where т, is the sample mean in the Ith stratum, whose true mean is denoted by jj. 
Since the m, are ар the sampling variance of Й is 

V) = Мат = wi ЕМ -&) (39.39) 
where we have applied ќав dion. in each stratum, and oj is the /th stratum 
variance. If we similarly write s? for the sample variance in the /th stratum, (39.12) 
gives the unbiassed estimator of (39. ig 


Via) = 4,3 м NH I- m. (39.40) 


V There is no danger of confusion with the notation for moments иу, if it is remembered that 
1 always identifies a stratum. 
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If we wish to make all the л; equal, as in the discussion of 39.14 which led us to stratified 
sampling, we must have m,/N, constant for all /; this is called a uniform sampling fraction 
(USF). However, we need not now specialize our investigation to this extent, since 
(39.38-40) are valid for any choice of the л, so that we may have variable sampling 
fractions. What we must now do is to ask which choice of the m, is most efficient; 
more precisely, if n = Хм is fixed, how should the integers л, be chosen to minimize 


the variance (39.39)? Our treatment follows that of Armitage (1947), although the 
main results were first obtained by Tschuprow in the 1920’s, and independently by 
Neyman (1934). 


Choice of stratum sample sizes 
39.16 The sampling variance (39.39) may be written 


= N,o;\2 
1 эу, oti) em (Naa) es. Б (Net =, 
iE NES (1-9) = Sipe мн арна 
1 = М,о,\? 
-danari ts (39.41) 


as the reader may verify by expanding the three terms on the right of (39.41). Of 
these three terms, only the second, which is non-negative, depends upon the stratum 
sample sizes т, at all. ‘Thus the sampling variance will be minimized for choice of the 
n, when this term is zero, i.e. when 


У Мо, 


No 
т п 


ог 
ee тше 39.42 
N, XNje/n ( ) 
1 
Thus minimum sampling variance is attained when the sampling fraction n,/N, in 
each stratum is made proportional to the square root of the population variance in that 
stratum. We call this the minimum variance allocation to strata, and denote the esti- 
mator in this case by Йму. The sample sizes determined in this way are usually 
fractional, and in practice the nearest integers to them would be used. 
It follows at once that the minimized sampling variance is (39.41) with the second 
term on the right omitted, i.e. on simplifying, 


Ve) = 3 £ Œ N)-E М, ai}. (39.43) 


The sampling variance is not much affected by small variations of the л from the values 
defined by (39.42), as Exercise 39.21 shows. 


39.17 Suppose, on the other hand, that we made all the л; equal, as in our original 
discussion, by putting 


all Z (39.44) 
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It follows at once from (39.38) that in this case ji reduces to the mean of the complete 
sample. We denote it by узр. 

It is now easily seen that the second and third terms on the right of (39.41) are 
equal in value but of opposite sign, and so cancel. The sampling variance resulting 
from the USF allocation (39.44) is therefore given by the first term alone on the right 
of (39.41). This will be greater than the minimized value unless the last term on 
the right of (39.41) is zero, i.e. unless 

о, = Ж М,о/М, all 1, 


which requires that all stratum variances be equal. In this case, of course, (39.44) 
and (39.42) agree. 


39.18 We now compare the sampling variance under a USF allocation to strata, 
as in 39.17, with that under equal-probabilities random sampling with no stratification, 
given in (39.10), which we now rewrite(* 

TI N-n > 
= ¢?(-——) = - = = 2 PEPY A 
Vin = (2-0) = уту 09—09 Nia) 
this identity holds because of the Analysis of Variance identity 
(N-1)e? = E (N,—1)? +E N, (m— u)’, 
П 1 


which is (35.25) rewritten in another notation. We have seen in 39.17 that for a 
USF allocation to strata, 


Vitus) = x rE Мо}, (39.45) 


and this is to be compared with V(mg). Their difference is 
|. woÓ£[[EQ-0s Емо хма) 
V(mg)— У(йозк) = = [ TOR TYPE. i! eee (39.46) 
The term in braces on the right of (39.46) is negative, since it equals 


a 2 

TX - b 9. 
Nan 20“ №) о «0. (39.47) 
It therefore follows at once from (39.46) that if the last term on its right is zero, i.e. 
if all ш are equal, 


V(mg) < (Две), 
so that stratification with USF allocation results in an increase in sampling variance. 
Furthermore, we have already seen in 39.17 that if the c; are all equal, the USF is 
the same as the MV allocation. Thus if the c; are all equal and the д; all equal, 


V(mg) < V(Avse) = Ví(Anv). (39.48) 
In these circumstances, any allocation of the sample over strata results in an increase 
in the sampling variance of the estimator of the population mean. 


е The suffix R denotes equal-probabilities random sampling without replacement. 
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Even if the y, differ slightly and the о; differ slightly, a result like (39.48) is still 


possible, і.е. we may have 
V(mp) < И(йму) < V(Avsr). 
Armitage (1947) gives details, some of which are in Exercises 39.14-15. 


39.19 Results like (39.48) very rarely occur in sampling practice, for they depend 
upon the inequality (39.47). Now the left-hand side of (39.47) is of order N~! for fixed 
k, whereas the other term inside the square brackets in (39.46), E N,(u:—y)?/(N—1), 
is of order 1 if the №, are of the same order of magnitude as №. Thus, N — оо with 
N,/N fixed, the term in braces on the right of (39.46) is relatively negligible, and the 
other term is non-negative, so that we have 


V(mg)2 И(Дозк) > У(Ймүу), (39.49) 
the equality on the left being attained if and only if all the и, are equal, and the equality 
on the right if and only if all the с; are equal. 

(39.49) is just what our intuitive argument in 39.14 led us to expect. Both there 
and more generally in 39.16-18 we have seen that it is the variation among the и, 
which produces the improvement in sampling variance through the use of a stratified 
sample with a USF (and the variation among the c; which produces any further im- 
provement due to MV allocation). This matches with the general conclusion at the 
end of 39.14, where only a USF was discussed, that strata should be as different between 
themselves as possible. 

'The use of strata in sample survey designs has obvious similarity to the use of 
blocks in experiment design (cf. 38.14). In each case, the grouping (of individuals, 
of experimental units) has as its aim the elimination from error (sampling, experimental) 
of the variation between groups (strata, blocks). There is, however, a difference of 
purpose as well as a similarity of method. In surveys, we are interested primarily 
in estimating the overall population mean, while the general mean is rarely of interest 
in experimental situations. This difference is a reflection of the fact that (cf. 38.3) 
experiments are concerned with hypothetical, rather than existent, populations. 


Minimum variance allocation for fixed total cost 

39.20 Before leaving the question of sample allocation to strata, we generalize 
the МУ allocation formula (39.42) to take account of variation in costs of sampling 
between strata. We have deferred this generalization because we may now deduce 
it as a special case of a general result on minimum variance allocation for fixed total 
cost, which we shall also find useful in other connexions. 

Suppose that in a given sampling problem the sampling variance of the estimator 
being used is of the form к 

= vr 
У = БС) UE (39.50) 

where v, and the v, are functions of population quantities only and the w; are not 
functions of the v’s. Quite commonly the total cost of carrying out the sample survey 
is representable in the form 


k 
С = ce X wc (39.51) 
1-1 
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where со is appropriately labelled “ overhead cost," and с, is a cost within an Аһ 
category, We now write 


(Ио) = E ZE а> Foe) (39.52) 
mm 1 
by the Cauchy inequality (see 2.7). The equality in (39.52) is attained if and only 


if the condition 
(2) / (2,c;) = constant, all J, 
wi 
holds. The extreme right-hand side of (39.52) is independent of the w,, so choice of 
the w, to satisfy this condition, which we rewrite 
wj ocv,/c, all 1, (39.53) 
will minimize VC, i.e. it will minimize V for fixed C (or C for fixed V). 


39.21 In our present application, the variance is given by (39.39), and the cost 
function is 


k 
CH= eX тс (39.54) 
1-1 


where c, is the overhead cost and c, is the cost of an observation in the /th stratum. 
Identifying (39.39) with (39.50) we see that here 


1 $ Nf ot 
% = ya = Мой, Wer т=н 
and hence (39.53) gives the MV allocation for fixed С as 
Nir 
n, oc No (39.55) 


The sampling fraction n,/N, is now to be made proportional to the square root of the 
stratum variance divided by the square root of the stratum cost per observation. (39.42), 
our previous MY allocation formula ignoring costs, is of course a special case of (39.55) 
when all c, are equal. 


Kokan and Khan (1967) give a numerical method for minimizing C when several 
variables are being investigated, each of which is to be estimated with a specified maximum 
variance. 


The formation of strata 

39.22 The reader will no doubt have noticed that throughout our discussion from 
39.15 onwards, we have been assuming that Ё fixed strata are given to us in advance, 
and that our only problem has been how to allocate our sample over these strata, But 
these strata must have been formed at some stage. How is this best done from the 
standpoint of ultimately minimizing variance in estimation? We first discuss this with 
k fixed, and later consider the effect of varying k. 

In view of the conclusion of 39.19, the aim in forming strata must be to maximize 
the variation between the stratum means. Thus, if we knew the distribution of the 
variable y in the population, we should select (k—1) cutting points within its range 

ч 
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to form k strata. How should these cutting points be chosen? The problem is 
theoretical, in the sense that we never in practice know the true distribution of y; 
however, we may have past values of the variable to guide us, or the values of a variable 
highly correlated with y may be known, so that the solution to the problem is of practical 
interest. The basic results are due to Dalenius (1950). 


39.23 First, consider the case where a USF is to be used. Ignoring constants 
n, №, and neglecting the difference between N, and N,—1, we rewrite the sampling 
variance (39.45) as 


Е Ге 
Vs) «V = E [* (yw) fo)» (39.56) 
= 1-1 
where the distribution of y in the population is represented by f(y), a& y «b, and 
а = (4663... <C< Ck = b, so that су, . . . , ст are the cutting points in the 


range of y which determine the strata boundaries. To minimize (39.56) for choice 
of the c's, we put 


o= ie = е-е) (аша) fs — 1=1,2,...,®—1, 


so that if f(cj) # 0, we have the solution 
t (саш)? = (i-is 
апа ѕіпсе 
Bie CE ua 


с = Мш+ша). (39.57) 
We therefore choose our cutting points so that they are half-way between the means 
of the strata they form. Given f(y) and k, this is not difficult to achieve numerically. 


this implies 


39.24 If on the other hand, we are to use the MV allocation sample sizes after 
the strata are formed, (39.43) is to be minimized by choice of the cutting points. The 
reader can verify by substituting (39.42) into (39.39) that the second term in braces 
in (39.43) arises only if the sampling fractions m,/N, are not negligible. We neglect 
these fractions. Ignoring constants л, N, (39.43) is then rewritten 


* ffa с к DD 
Ий) Dr» [S {[' foa. f" оидо). (959) 
We need only minimize D, so that we put 
iom Es ttem, nn Er 
<i AA foa. |" omit foray} 
4-1 1 
fe) [A отау) + [1 fon dy Lerma)? fled) 
= тонун CTSNET нени 
[sona Гето о) Ф 
l= 1, 2,..., k-1. 
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If f(c,) # 0, this reduces, on cancelling a factor N, throughout, and returning to our 
original notation, to 


i+ (ema)? _ etat(- nay (39.59) 
о Spa 


If of is the same for all strata, (39.59) reduces to (39.57), as it must, but in general these 
conditions are not so easily satisfied as (39.57), since variances as well as means of 
strata are involved (as we should expect). We therefore seek approximations to (39.59). 


39.25 If k is large, we may assume f(y) to be constant within each stratum, say 
equal to f} ‘Then 


мм |" fob eh? d= fca 
and 


9 = Ys(ei- cia) 
the variance of a uniform distribution. Thus the expression to be minimized in (39.43) 
is, to the same order as in 39.24, proportional to 


1 1 
11 EN, PL fie ea) = TES (fF? (i-e). (39.60) 
If we now define the transformation 


v 
20) = [лора (39.61) 
а 
(39.60) тау be rewritten 
1 1 
Ni У Мо = V E (z(e)) - 3(611))*- 
We therefore require to minimize a sum of squares of quantities 2(cj) —2(cj.,), whose 
sum is fixed at z(b)— z(a). The solution is to make 2(cj) —2(c;.,) constant. Thus 


we apply the transformation (39.61) and determine the cutting points as the roots 
of the equation 


(у) = 80) (0), 15152 =o B (39.62) 


Dalenius and Hodges (1957, 1959), to whom this approximation is due, show how to 
use it numerically and obtain a closer approximation. An alternative approximation 
given by Ekman (1959) is derivable by writing, more simply, 


1 
z Мо, + TT А М(с— 612). (39.63) 


It follows from our discussion above that У {N,(c,—c;,_;)}* is constant and we there- 
1 
fore minimize (39.63) by making (N;(c;—c;.,))* constant, or equivalently 
N,(¢;—¢;-3) = constant. (39.64) 


Cochran (1961) examined numerically the use of approximations (39.62) and (39.64) 
for small k (equal to 2, 3 or 4) and found that they performed consistently well when 
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applied to eight representative skew distributions. He also discusses other, less satis- 
factory, approximations to (39.59). 

It should be noted that each of the approximate solutions (39.62) and (39.64) 
implies that N,¢; is to be constant over all strata. But if this were so, the MV alloca- 
tion formula (39.42) shows that we should have n, constant in every stratum. Thus 
we can expect a stratification with equal sample size in each stratum to give a variance 
nearly the minimum possible if the cutting points between strata are chosen to mini- 
mize (39.39). The derivation of these is left to the reader as Exercise 39.18; Cochran 
(1961) found that they differ very little from those obtained by use of (39.62) or (39.64). 


S. P. Ghosh (1963b) extends the analysis of the problem of locating stratum boundaries 
to the case where the strata are based on the values of two (correlated) variables. 
Random formation of strata is discussed in Exercises 40.4—6. 


39.26 If we always sample with a USF or with a MV allocation, and stratum sizes 
are large, (39.49) assures us that we can never increase variance by subdividing strata to 
form sub-strata, so that one is led logically to the conclusion that a sample of л observa- 
tion should be selected with k = n strata, one observation from each; or, if we want 
to use (39.40) to estimate sampling variance (requiring a minimum of 2 observations 
in each stratum) with А = [jn] strata. When n is small, and we have fairly detailed 
knowledge of the underlying distribution, there is a good deal to be said for doing this; 
but otherwise the labour involved is hardly likely to be justified, for there is a good deal 
of empirical evidence that as k increases, the minimum attainable variance declines 
more and more slowly, and that very often k = 2, 3 or 4 is nearly as good as the best. 
This is due to the fact that our knowledge of the underlying distribution is usually 
rather imprecise. Cochran (1963) and Dalenius (1953) give numerical examples. 
A detailed empirical study of the effect of strata formation and sample allocation on 
estimator variance was made by Hess et al. (1966). 


3927 Finally, we remark that the effect of any stratification upon sampling variance 
can always be estimated after sampling. ‘To do this, we need only use (39.40) to estimate 
the variance of 2 and compare this with an estimate of the variance of mg, given in 
suitable form at the beginning of 39.18. From that formula, we see that the only 
problem is to estimate 

В = X Ми—н)* 


eer 
= DN (QN 
1 А 2 
= мий дЕ Nipi HEE NN wut) (39.65) 
Now because 


Bin) = Vim) = (1-89), 


we have 
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and hence, from (39.65), an unbiassed estimator of B is 


= _N? 2 (т E 
жыллы Ree 


1 5 1 si 
= E Nm- (Е Мт)" у (У ММ т) р. (39.66) 
The estimator of unstratified random sampling variance is therefore, from 39.18, 
zoun ET TS 
Ё(тв) = им елү, С! 1)s7+B}, (39.67) 


where B is defined by (39.66). 


Sample designs: clustering 

39.28 We were led to the principle of stratification by our discussion in 39.14 
of the effects of varying the probabilities луу while the л; were all fixed at n/N; we saw 
there that it would be profitable to increase some луу slightly (and reduce the others 
to compensate this) as compared with their equal-probabilities sampling values. We 
now ask whether it may not be worth while to press this further and make some of the 
ay as large as possible. From their definitions in 39.6 we see that z;2 7; & zt; always, 
so that if all л; = n/N, z;&n/N. 

Suppose, then, that we divide the N individuals in the population into М, groups, 
each containing N, individuals (so that N,N, = №), and that for all pairs 7, j within 
any group we put лу = л, = лу = n/N. There аге N,(N,—-1) pairs within each 
group, and hence №, N,(N,—1) = N(N,— 1) pairs i, j for which лу; is thus increased. 
From (39.16), all the N(N —1) луу in the population must add to z(z— 1), so that the 
N(N—-1)-N(N,—-1) = N(N—N,) pairs i, j whose луу has not been increased must 


be allotted values of л; adding to n(n—1)—N(Na~1).% =n(n—N,). If we make 
n(n— №) 


all these values of л, equal, each will have value NNN; Suppose that we chose 
n to be a multiple of Ny, say n, N. Then these л; = m(m—1) . Now we recognize 
№(№%,-1) 


from 39.6 that this is the value which z; would have in sampling л; units out of N, 
using equal-probabilities random sampling. If we realize that the equality of zi; 
and л; within groups implies that each group is either selected as a whole or not at all, 
we see that our present sample design consists simply of dividing the population 
into №, equal groups (called clusters) and selecting n, of these with equal probabilities at 
random. 


39.29 А special case of cluster sampling is systematic sampling, in which the popula- 
tion is arranged (either physically or by means of a list) in a single sequence of N — 
М, №, individuals. From among the first N, of these, a single individual is selected at 
random with equal probabilities of selection. If the pth individual is thus selected, 
the systematic sample consists of the individuals in positions р, p+N,, р+2№,,..., 
p+(Na—1)N;. Thus only N, samples, each of size N,, are possible and there is a 
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complete formal identity between systematic sampling and the cluster sampling des- 
cribed in 39.28, but a few special features serve to differentiate them in the literature. 

First, the fact that systematic sampling employs clusters which are not physically 
contiguous strongly constrasts it with other forms of clustering which (as the name 
implies) use sets of “ neighbouring " individuals in the physical population. Second, 
systematic sampling from lists which are effectively in random order is often used as 
a practically easier substitute for equal-probabilities random sampling; the method is 
in these circumstances sometimes known as quasi-random sampling. But the most 
important differentiating factor is that in systematic sampling we commonly find that 
only one of the N, possible samples is selected, i.e. nı = 1. This immediately renders 
the estimation of sampling variance impossible without the availability of supple- 
mentary information of some kind. It seems simpler to insist that л, 2 2 so that valid 
estimation of sampling variance is possible. 

Discussions and bibliographies of systematic sampling are given by Cochran (1963) 
and Yates (1960), both of whom have contributed notably to its theory. 


39.30 What effect will cluster sampling have on V(/) at (39.23)? "The contri- 
butions from the pairs 7, j within the same clusters will now actually be negative, since 


Mj Ty = - і -%) for these pairs. The contributions from the other pairs, 


n,(Ni—m) 

NH, -1y The 
argument of 39.14 now applies: if we put the large values of | y;— y; | in the same cluster 
with луу maximized, we should expect (39.23) to be reduced, and perhaps more dramatic- 
ally than in 39.14, since the larger values of лү, now operate to reduce (39.23), instead 
of merely contributing nothing to it as in 39.14. 

We thus arrive at a general principle of cluster-construction which operates in exactly 
the opposite direction from the principle of stratification we have discussed in 39.14: 
form the population into internally heterogeneous groups to reduce cluster sampling 
variance below equal-probabilities random sampling variance. 

As with stratification, we shall now abandon the general framework and enter into 
a particularized discussion of the details, but we make two general points here. 

The primary distinction between stratification and clustering is that every stratum 
is sampled, while clusters themselves are subject to a selection procedure; it is this 
fact which leads the principles for the two methods in opposite directions. In stratified 
sampling, the sampling variability is confined within strata and we construct strata to 
minimize within-strata variability; in cluster sampling, there is only between-cluster 
variability, since every cluster is sampled entire, and we construct clusters to minimize 
this. 

Secondly, it is worth while emphasizing here, although it is explicit in 39.14 and 
39.28, that neither stratified sampling with a USF, nor cluster sampling with all 
clusters of equal size, make any change in the overall selection probabilities z;, which 
are n/N in each case, just as for equal-probabilities random sampling; these methods 
operate purely by modifying the joint selection probabilities лу. ОЁ course, in stratified 


in different clusters, will be positive, since for them лүл,—лу = 
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sampling with variable sampling fractions, the л; themselves are changed; and the 
same applies to cluster sampling when clusters are of unequal size. We shall discuss 
this more general situation below, and at the same time we find it convenient to generalize 
our discussion in another direction. 


Multi-stage sampling 

39.31 Cluster sampling presupposes a grouping of the population members, 
some of the groups then being selected, It is natural to consider the more general 
situation where the groups (clusters) are the subject of further sampling at a later stage. 
Retaining the notation of 39.28 as far as possible, we now formulate this more general 
situation. 

The population of № members is grouped into №, first-stage units (previously 
called clusters). The ith such unit contains N;, second-stage units, the jth of which 
contains №,з third-stage units. ‘This hierarchical process can be continued indefinitely, 
but we shall not consider more than three stages, and indeed it is sometimes enough 
for our purposes to consider only two stages. 

With three stages, we have 

N, Ns 


N= Ў Ў Ny 


i=1j=1 
At the first stage, n, (out of Nj) first-stage units are selected, by a method as yet un- 
specified. Within the ith of these, л second-stage units are selected, from the jth 
of which жу; third-stage units are selected, and sample size is 

тола 


n= Ў Xn; 
f=1j=1 


We assume that sampling at any stage may be with unequal probabilities; that selection 
at any stage is independent of selection at other stages, and that sampling within any 
unit at a given stage is independent of the sampling within other units at that stage. 


39.32 We first have to determine the form of the unbiassed estimator. The 
general theory of selection with unequal probabilities in 39.6-8 applies here, and in 
particular, (39.20) gives an unbiassed estimator /j, while (39.23) and (39.24) remain 
valid expressions for the sampling variance of /j, and for an estimator of it. The 
л; and zr; in these expressions must now refer, of course, to overall probabilities of 
selection, taking account of all stages of selection. We now relabel the values y; in 
the population as уу, each suffix corresponding to a division into the sampling units 
at a stage, so that, for example, Vss is a population value in the 8th first-stage unit, 
in the 4th second-stage unit within that first-stage unit and in the 6th third-stage unit 
within that again. In this notation, the unbiassed estimator (39.20) becomes 


201 Nw у 
== Ж 1 2 288 39.68 
UNE j=1k=1 лду ( ) 


where лу is now) the overall probability of selection of the single value ушу. 


(9) The parentheses in the suffix are to prevent confusion with the joint probabilities лу pre- 
viously used. 
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For example, suppose that sampling is with equal probabilities at every stage. 
Then 


л = Mm Mia Nis 
ij E "VAT O* » 
М, Niz Nijs 


and (39.68) reduces to 
2220 $» Na Ng 
= Tb aL - ау 39.69 
d Nn, i-i My j=1 Nija qu ( ) 
If, further, Nj, = Му, all i, and Nig = Na, all i, j, while nig = n, and жу = mg similarly, 
(39.69) reduces to 
„_ NNN; 
- TID yy = т, 39. 
Р Nummi) Pisces EU, 
the overall sample mean, since in this case N = №, №№, n = тп. (39.70) is 
intuitively obvious from the symmetry of this situation. 


39.33 Similarly, if there are only two stages of sampling, we drop the suffix k 
and its summation in (39.68), and the equal-probabilities estimator is (39.69) with 
(Nus/n;j3) = Уш replaced by у, reducing to m in the symmetrical case as at (39.70). 

k 


We can formally derive many two-stage from three-stage formulae by putting Му = 
mugs = 1, or putting №, = m, = 1. This has the effect of making one suffix (and 
hence its summation) redundant. 


39.34 Just as in our treatment of stratified sampling, we shall find it more con- 
venient to make a direct approach in discussing the sampling variance of our estimator 
in multi-stage sampling, rather than to persist with the general unequal-probabilities 
notation—this will avoid, for example, the use of symbols like лу tw) for the joint 
probability of selecting two values Ун, Уһе We shall consider in detail two special 
cases, the first of which is that of sampling with equal probabilities at every stage. 


39.35 Consider, then, the sampling variance of the estimator 2 at (39.69). I 
is obvious that each stage of sampling contributes to the variability of Д. Since 
sampling is independent at the different stages, we divide the variation of Á into a 
sequence of conditional variations. First, we consider its variance at the last stage, 
conditional upon earlier-stage selections being fixed; then we allow the penultimate- 
stage selections to vary, and so on until the first-stage selections are varied. With 
three stages, for example, we take the variance at the third stage, conditional upon the 
first two stages’ selections being fixed, then we allow the second-stage selections to 
vary, and finally allow the first-stage selections to vary. Symbolically, we write this 
process 


ИИ) = Еа)" = ELE (E(A—»)9}. (39.71) 


To evaluate (39.71), we shall make use of a general result concerning the variance 
of a random variable, special cases of which we have already used in earlier chapters. 
We now prove the result in general. 
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39.36 Consider a random variable x, and let c be a condition upon the distribution 
of x. By the multiplication theorem of probability, Р(х) = Р, (х|с)Р, (с), where 
P, P, and P, are the probabilities of their arguments. Thus 

E(x) = E {E(x|c)}, (39.72) 


expressing symbolically the fact that in finding the expectation of x we may first impose 

any condition, find the expectation of x given that condition, and then remove the 

effect of the condition by taking the expectation of the conditional expectation itself. 
The variance of x is obtained by considering the identity 


Е[Е(х*|с)— {E(ele)}*1 + EEG GIO – (Е (E11 
= Е[(х*|с)— E (£11 


= E(x)- (Е(а))?, 

by (39.72). Ву definition, the first term on the left-hand side is the expectation of the 
conditional variance of x given c; the second term on the left is the variance of the 
conditional expectation of x given c; and the extreme right-hand side is the unconditiona) 
variance of x. Thus, symbolically, we have 

V(x) = E (V(x|c)) - V {E(x|c)}; (39.73) 

А с 

the unconditional variance is the mean of the conditional variance plus the variance 
of the conditional mean. Note that if E(x|c) does not depend upon c, the second term 
in (39.73) is zero, and V(x) is simply the mean of the conditional variance. 


The result is quite general: for example, it was, in effect, used in 17.35 to establish 
the Rao-Blackwell method of improving estimators through sufficient statistics. 


39.37 Using (39.73), we now see that (39.71) may be written 
Via) = EV (0) V EO), (39.74) 
where we write the symbol “ 12 ” for the first- and second-stage conditioning. (39.74 
displays the required variance in two parts, which we now evaluate separately. 
Consider first the value of V (A). At the third (more generally, the last) stage of 
3 
selection, each of the $ mj, selected second-stage units is sampled; the sampling is 
iei 


with equal probabilities, 71,5 individuals being selected out of М. This is in effect 
a stratified sample with each second-stage unit playing the role of a stratum. Defining 


1 uw 
ту = — Ù уць 


Mija k=1 
we know that d 
V(m,) = 28 ( -ме) 
(mo) maV Na 


where оў, is the population variance in the second-stage unit from which this sample of 
пуз was drawn. It follows from (39.69) that 


NM & (Na % оў, п, 
жы (№ i: 2 Sis (4. Tus 
VQ) (re) 5 ( ) Ў Nis ( хе). (39.75) 


i=1 Mg j=1 Nija 
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We now have to find the expectation of (39.75) when the selections at earlier stages 
are allowed to vary. In allowing чыгу ў selections to vary, we obtain 


оў, n 
EV (â) = X N2, 94 [1 Лиз 
23 (0 E: В) ix і=1 na ЕР ј=1 = a e)l 


and since the expression in square brackets is a sample mean whose expectation is the 
corresponding population mean, this is 


BRL. wa ма) 
= аула SU (s An. 
(к 6-1 а Nujel 7 mus Ni 


Similarly, with the first stage varying, 


m 1 N, сў п, 
EEV(4) = І 3 Na ма À (1- а). 
123 (0 N?n, My i-1 Mig j=1 Nija Му 


ММ s O ( na) 
> Nó — (1— 99). 39.76 

= Nem iei Nig j=1 Unos Nis ( ) 
We have thus evaluated the first term on the right of (39.74). 


39.38 For the second term іп (39.74), we first need the value of E(ji). From 
3 
(39.69), this is 


N Мат 
2 X NE 
iy int ma j- parent (my) 
N; a Na 
ў Ха 
= Nn, i=1 а j=1 


Е(й) = 
3 
ÈN, ija Kij (39.77) 


where u; is the population mean corresponding to the sample mean m;. We write 
Nija hij = Ty for the total of the y-values in this (i, j)th unit. We now re-apply (39.73) 
to the second term on the right of (39.74), which then becomes 


Уй) = EE (V M)+EV (ЕСЙ + УТЕ EO) (39.78) 
"The last term on the right of (39. ne can now be obtained from (39.77), for, as before, 
Е (Е(0) уу x. i Nab [> X Ty] 


2 Lia j=1 


N gF 


mu e ij 


Nn, i-1j-1 


veran- (S) v[z € (3 r.)]. 


'The variance required is that of a sample mean, and is therefore 


and hence 
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where 07, is the variance between the totals of the y-values in the first-stage units, i.e. 


х, з 
и ОУ (т A il 
желке, СА T 


where T, = E Ty. 
j 


Finally, we require the middle term on the right of (39.78). From (39.77), 


V (E()) = (20 "зму. bi т] 
а UNm n a 


i=1 2 [Miz j-1 
AVE: ( ле) 
= (А futs). 39.80 
(we) лу Nis Ма, ( ) 
where o7, is the variance between second-stage unit totals within the ith first-stage 


unit, i.e. 
Ns 2 
o 1 Y (т a E 
T Nacl jar vs Na | P 


ЕУ EON = уы d Ў ма (1-а) 


11 LM i-1 Nie N; 


Ni Ў vita = а). (39.81) 


o Nim i Mia 12, 


(39.80) now gives 


39.39 Thus, substituting (39.76), (39.79) and (39.81) into (39.78), we finally 


have 
vu. M B 3 Oty Nia 
ШЕ (Ry zn A) E ut ta\ Na, 
№ 5 Ny х оў ( Nn; 
У Y м 1- Tus 39.82 
Мп, (етта jai “лу Мз, f ) 


(39.82) shows, as was evident from (39.78) and indeed from intuitive considerations, 
that there is a contribution to the variance of the estimator from each stage of sampling. 
As a special case of (39.82), consider the symmetrical situation 


Ма = № all i; № = №, all i, j; N = N,N,Ns; 
та = Mg, all i; пу = п all £ j; п = ттт. 


The estimator Й here reduced at (39.70) to the overall sample mean. (39.82) reduces 
to 


+ 


ИСТЕ 
N, оў, М d Му, By Faw > 
VO = (NEN! m + Nimm 2° нат 2, Ze 0983) 
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If we now define 
щ = Ti/(Na №), 


ot = з= E (ca = IN 


N,- 
es arn EU 3 ot 
T NNa- Iiia S N,N2 int 77 
and 
к 1 $3 ў х x Ў о 
= N, N, (Na— 1) ii j= = > Ow a) "xA 1" 


we may rewrite (39.83) in this symmetrical case as 


^a _ т os _ Ms СА 3 mt) 9 
Wo = ny ( ss t N, tamm ; NV pum 
(39.84) makes the extension to further stages of sampling obvious in the symmetrical 
case. The general formula for 7 stages is 


оў oH. 
vo- E  -R) (89.85) 
where 
fess 1 33. „Жө =й 
К"... лил joa een ent Md 


39.40 If any sampling fraction ,/N, is unity, the corresponding term in (39.82) 
disappears. In particular, if every ж = № = 1, the last summation on the right 
vanishes, and the remaining two terms give V(/) for two-stage sampling. If every 
та = Nj in addition, only the first term on the right survives, and we are back at 
the most general form of cluster sampling with unequal-size clusters. Similarly, 
the first term of (39.84) applies to equal-size cluster sampling. 

"There is, in fact, no difficulty in seeing how (39.82) would extend for further stages 
of sampling. A fourth "i would add to the right-hand side the term 

N, Y Na e Nue Gin :( - Fe) 

КЕГИ > Nig ti Nija 2 Мы п { Nie)’ ES 
and оў in (39.82) would have to be replaced Бу оў, defined as an obvious extension 
of оў, and of. 

However, it is extremely rare in practice for multi-stage sampling to use equal- 
probabilities sampling throughout. The reason is, quite simply, that the variances 
in (39.82) are variances between totals of the variable y in the different units. When 
the units vary considerably in size (i.e. contain widely different members of next- 
stage units) the effect is to make the variance of ji very large. As we have already 
seen, this point does not arise in the symmetrical case when all units at any stage are 
of equal size and the variances can be redefined as variances between means. In 
general, however, we are obliged to seek some other sampling scheme to reduce the 
sampling variance of Д to acceptable levels. In fact we achieve this by sampling 
with varying probabilities. We may, of course, use any sets of probabilities whatever 
at each stage, with Д defined by (39.68), and calculate V(4) from (39.78); but in general 
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the terms оп the right of (39.78) will be more complicated since they will reflect the 
varying probabilities at every stage. 

А completely general expression for V(j) is formally derived at (39.97-8) below 
for the purpose of estimating sampling variance. Meanwhile, we consider one sample 
design which is important in practice. 


Sampling with probability proportional to size 

39.41 Inspection of (39.68) shows that if we make every ли = n/N, the estimator 
Â will reduce to the sample mean m, exactly as in the equal-probabilities symmetrical 
case at (39.70). 'То achieve this, we require only that the probability of selecting the 
ith unit at the first stage be n, Aj?/N; that the probability at the second stage of select- 
ing the jth unit from the ith first-stage unit be п, 45/41"; and that the probability 
at the third stage of у, being selected be n,/4jj. We then have 

MP AP я 
лур = mN "Ap AS = ў 

It will be seen that within any penultimate-stage unit, final-stage selection is with 
equal probabilities, but these probabilities in general will differ between penultimate 
units. The ДЇ, AP may be any convenient sets of members we choose. A sample 
design which satisfies луу = n/N is said to be self-weighting, since 1 is the (equally 
weighted) mean of the sample. 

One simple and convenient choice is to make 


Ne 
А? = 2 Now AP = Му. 


Each unit at the first stage then has probability of selection proportional to the numbe 
of individuals it contains: the same is true at the second stage; and at the final stage, 
selection is with equal probabilities. We express this by saying that we sample with 
probability proportional to size (p.p.s.) at each of the earlier stages and with equal 
probabilities at the last stage. 

It is easy to see that for any number p>2 of stages, overall selection probabilities 
for every individual will be equal to n/N if we sample with p.p.s. at all but the last 
stage, where equal probabilities are to be used. 

P.p.s. sampling was first theoretically investigated by Hansen and Hurwitz (1943), 
and was actually the earliest form of explicit unequal-probabilities sampling. 


39.42 The simplest way of achieving the self-weighting p.p.s. sample design 
discussed in 39.41 is to select n first-stage units with replacement, using probabilities 


Nu 
pP = X Nys/N at each drawing; then to select n, second-stage units with replace- 
j=1 


Ne 
ment from each of the л, selected first-stage units, using probabilities рў = Му, em Муз 


at each drawing; and finally to select л; third-stage units without replacement from 
each of the m,n, selected second-stage units with equal probabilities р = 1/Niys at 
each drawing. The sampling with replacement at the two p.p.s. stages enables us 
to use the simplified theory of 39.12 in what follows. 
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We write the estimator 
Inm om 1 
б@=-} 3x =-— 2DE 
Ё Watt nina pee 


Its variance is given by (39.78), whose terms we now evaluate. 


39.43 Because the final-stage sampling is with equal probabilities we have, as 
in 39.37, 


so that 
z 1 
ws буун ЁЁ (1 E 
Hence 
c 1 n 
LE X -> |. ] 
FA) TH i=12 БЕС ( al SEE 


At the second stage, п, out of N; units are selected within the ith first-stage unit, 


with probabilities N;;3/ 5y Nija at each drawing. Thus (39.87) becomes 
j=1 


LR NIN n 
EV(â) = yz ijs thot 
2 re) Ni My Ng imi j=1 (s те.) 16 xS) 


х. (XN, N, n 
mto stu Egi 
12 AD лп; pu. ј=1 У Муз, Му, 
Т (39.88) 
пулта 7 9 leye В A 
This is the first term on the right of (39.78). Further, 


Similarly 


L m My 
E(@ = — = = py, 39.89) 
E(R) BM (39.89) 
m 1 5 Ne / N, 
S Ў Е by = = ЭГ UJ 
EEA) cis [52 | а Е (= е) 
TOM 
roS 
т jor 
where ш, as previously, is the mean of y in the ith first-stage unit. Thus 
Tih 1 
VEE(@) =V(— X =—V 
128 (0 1 (s 2 и) 7,1 (m) 
XN E EN 
P 1 zn (4 е ae ысы 2 
Ry i=1 N са N 
codi ss er 
= a Ма (и‹—н)°, (39.90) 
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. Ns H m . B 
where we now write Ат = X Nj, for the total number of individuals in the ith 
ј=1 


first-stage unit. (39.90) is the last term оп the right of (39.78). The middle term 
there is, from (39.89), 
1 


183 1 Li-1 2 (%Mgj=1 


1 ™ 1% /N, Na ү, 2 
A LY (NasV (,,,, у" М» 
1 [5 ELE j-1 Nir n) ] 


1 ^» 1 à 
= E|— У — iN, ў? 
т [sins Мауса iga (И ш) ] 
1 H^ à. 
- X Y Ng(nu- up. (39.91) 


nn, N i-i j=1 
Putting (39.88), (39.90) and (39.91) into (39.78), we obtain 


Vi) = _1_ 2 Nelu- og E Ў Nau 
^ mN int EuN LEM 131 ia Mi M 
Lot at a (1-е, (39.92) 
тта imijer 99 Na Ө 


39.44 It will be observed that (39.92) is almost exactly of the same form as (39.84), 
the variance formula for equal-probabilities sampling in the symmetrical case. In fact, 
apart from the simplification occasioned by the sampling being now with replacement 
at the first two stages, the only difference is that the Л, occur as weights in each com- 
ponent of the variance, as they must do because of the unequal sizes of units. We 
may write (39.92) in the same form as (39.84), 

ма ая 
V(A) = али (39.93) 
with obvious definitions, and all the variances are between means, not totals as in 
(39.82). "Thus the present p.p.s. sample design has the effect of eliminating the influence 
of the varying sizes of the units at the stages of sampling before the last. 

Clearly, a similar result will follow for any number of stages. The two-stage 
result is obtained from (39.92) by putting п, = № = 1 and making appropriate 
changes in notation. The reader is asked in Exercise 39.24 to show that in this case, 
on defining symbols obviously, 


"ue cm ET 1 "LS 
Vi) = Ls E Nal mL 2, Nan (i-&) 8999 


Estimation of sampling variance in multi-stage sampling 

39.45 Although, as indicated at the end of 39.40, the general formula for V(À) 
for completely arbitrary probabilities of selection at each stage, is lengthy and of no 
particular interest, it is a remarkable fact that a very general method for the unbiassed 
estimation of the sampling variance of an estimator in multi-stage sampling is easily 
obtained, and that it is of a very simple form. 
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Suppose that there is an arbitrary number of stages of sampling, and that we sample 
without replacement at the first stage, where the probability of the ith unit being 


included among the z, units selected is a", while the joint probability of both ith and 


jth units being selected at the first stage is л). Using (39.73), we write the variance 
of any estimator (not necessarily of и) as 


v) = EVO)}+V EO), (39.95) 


where we use the omnibus symbol ** >1” to represent all stages of sampling after 
the first. We suppose that the estimator 0 may be written in the form 


a m 
б= У +. 
i=1 
(39.68) is of this form, and so more generally is (39.20) for any number of stages. If 
we apply the alternative estimator (39.27) to the first-stage sampling, therefore writing 
n for п and interpreting y; as the sum of y in the rth first-stage unit, this is another 
estimator of the form we are discussing. So is its improved form in 39.11. 
Since later stages of sampling are carried out independently within the different 

selected first-stage units, 


V@)= È V(t), 
>1 i=l >1 
and hence (39.95) may be rewritten 


VO = vt OES Ë v) 


= ДЕ E (t) E at V(t), (39.96) 
1 $21 >1 


i=1 >1 


using (39.19). Applying (39.18) to the first term on the right of (39.96), and expanding 
the other term, we obtain 


VO) = Х AP-a) (E t0) 25 (ola) E (t) E () 
t= 21 GS 1 >1 >1 
РЯ 
+ PIEC- (E04 (39.97) 


Ny N, 
= Y aP (1-29) E (0) EX. (a9 —aiaj) E (4) E(t) 
i=1 >1 rs >1 >1 
N, 
+ Ў (n9) V (t). (39.98) 
i=1 >1 


39.46 We now seek an unbiassed estimator of (39.98). So far as the first two 
terms on its right are concerned, we need only substitute /? for E (tẹ) and t,t, for 
>1 


E (1) Е (2), since t; and t; аге independent at stages of sampling after the first. If 
>i >1 
we do this, the first two terms also become equal to the first two оп the right of (39.97) 
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with E (t) replaced by t; Since these latter represent ү{® Ee) in (39.96), it 
21 1 \Wi=1 >1 
follows that we have 
ГА N, n 
Ў олю(1—л®у + ЎЎ (Paha) tty = ү{® 2 = V(Ó. (39.99) 
i=1 ij=1 1 U-1 1 
isj 
The last term on the right of (39.98) is also easily estimated, since if 
E {P (t)) = V (t), 
»1 »1 21 


z{ Ў a f) =E È "PE Pa] 
ї=1 >1 1 i=1 Sk St 
т N 
= g| È Pra] - Ee reo 
1 i=l >1 i=l >1 
by (39.19). Thus, using (39.99), we see that (39.98) is the expected value of 
V*() = V ()- È at P (t). (39.100) 
1 i=l >1 


However, (39.100) is not a statistic, since we have yet to estimate V (ô). If f'() is 
1 1 


unbiassed for V (Ô), we finally have the unbiassed estimator of (39.98) 
1 
Vô) = P(0)-- БУ aP P (t). (39.101) 
1 i=l >1 


(39.101) expresses the rule first generally formulated by Durbin (1953) after an earlier 
more specialized statement by Yates. We state it in words: 

An unbiassed estimator of sampling variance in multi-stage sampling, when the 
first-stage sampling is without replacement, is obtainable as the sum of two components. 
The first component estimates the variance as if only the first-stage sampling had taken 
place. The second component is the weighted sum of the estimates, within the 
selected first-stage units, of the variances due to later stages of sampling (the first-stage 
units being regarded as fixed); the weights are the probabilities of selection of these 
first-stage units. 


39.47 The expression (39.101) may be broken down into further components 
to facilitate its use. If we write t; = 5j tip we may apply (39.101) itself to the terms 
j=1 
Р (t,) and obtain 
>1 ts 
f (t) = P (t) X л? P (ty), (39.102) 
>1 2 j=1 >2 


where л? is the probability of selecting the jth second-stage unit in the ith first-stage 
unit. Substituting (39.102) into (39.101), we obtain 


PO) = P) € ap P (t)+ È am X ag P (ty). (39.103) 
1 i-i 2 i-i j=1 22 
The pattern for further extension is now obvious. For p>2 stages, the result is 


Рф) = 70+ È [iE аф 0E ay} Ps... (39.104) 
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It follows from (39.104) that if all л{ are small enough to be negligible (which 
can only happen if the number N, of first-stage units in the population is large) only 
the first term оп the right-hand side contributes materially to V(0). In this case, 
the methods used at the stages of sampling after the first do not affect the form of the 
approximate estimator of sampling variance, although they will, of course, affect its 
value since they determine the value of the first term Ё (б). 

1 


39.48 In the completely symmetrical equal-probabilities case for which V(ji) was 
given at (39.85), its estimator (as the reader is asked to show in Exercise 39.25) is 
given by (39.104) as 


a) = (1-7% E (Sx "adm 2 E 
Vá) a VENN NA) ыы к\т) 69.108) 


where 
1 тот ne 
ES SEBO NON rin «lh 
T... 1 (n,—1) i511 20% Ае t Mit) 
is the sample correspondent of o; defined below (39.85). (39.105) has the same 
structure as (39.85), save only that every term after the first is multiplied by the product 


$= 


of earlier-stage sampling fractions x M -... Here again, as at the end of 39.47, 
we see that if n/N, is negligible, ы 
Pa) = ї, (39.106) 
т 


irrespective of the methods of sampling used at later stages than the first. 


39.49 If we are sampling with replacement at the first stage, (39.104) must be 
modified to take account of the replacement of (39.19) by (39.33). It will be sufficient 
to reconsider the derivation of (39.101), from which (39.104) followed. We first note 
that since (39.96) depended upon later stages of sampling being carried out inde- 
pendently in the different selected first-stage units, we must now insist that if a first- 
stage unit is selected r>1 times, the later stages of sampling must be carried out r 
times independently within it. 

As we have already seen in 39.12, the effect of sampling with replacement on (39.18) 
is to allow луу (but not лулу) to have equal suffixes in the double summation. We 
may therefore absorb the term in л? from the first summation into the second. Thus, 
(39.97) must be replaced by 


^ N, N, 
VO) = > aP(E(t)) EE (a) -a a) E (t) E (t) 
i=1 >1 i,j=1 >1 21 
EA 
+E aP [E (0) {E (6) 
i=1 21 1 


N, N, 
= X aP E(8)+ XX (ар лл?) E (t) E (t), (39.107) 
i=1 >1 ї,ј=1 >i >1 
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the analogue of (39.98). As at (39.99), we see that this is the expected value of 
cit oun N, 
V(6) = X aP t+ УУ (paa) tt, (39.108) 
1 i=1 i,j 


$j-i 
so that here the unbiassed estimating statistic is simply 


Р(ф) = 0) (39.109) 


instead of (39.101). No contribution arises from the subsequent stages of sampling, 
which influence the value of (Ô), but not its form. Clearly, this result also replaces 
the formula (39.104). Thus the Yates-Durbin rule given at the end of 39.46 simplifies 
if sampling at the first stage is with replacement: only the first component given by 
the rule should be calculated. 

If first-stage sampling is with equal probabilities, (39.109) reduces to (39.106), 
which now holds exactly. More generally, in view of the remarks in the last para- 
graph of 39.47 we see that (39.109) may be regarded as the limit of (39.104) when all 
zi? —> 0, just as the estimator of variance in single-stage simple random sampling with 
replacement may be derived from the without-replacement formula by letting n/N — 0. 


39.50 If the probabilities of selection are the same at each first-stage drawing, the 
general formula (39.109) can actually be explicitly written down, for the estimator of 
variance in one-stage unequal-probabilities sampling with replacement has already 


been given for that case in 39.12. Неге, the estimator is б = X t; instead of (39.34), 
i=1 


so that instead of (39.36) we have 
= € (1,8) 
V(6) = AU) ER > (i x) 5 (39.110) 
а remarkably simple form for estimating the sampling variance in multi-stage sampling 
with any number of stages when the first stage, with replacement, uses the same 
unequal probabilities at each drawing; the other stages are arbitrary, apart from the 
independent sampling condition in the first paragraph of 39.49. 


Minimum variance allocation in multi-stage sampling 

39.51 We first confine ourselves to the situation where, at each stage of sampling, 
the same number of units is selected from each previous-stage unit. (For three stages 
this means that мз = пз, Жуз = n.) In this case, both the general equal-probabilities 
formula (39.82) and the p.p.s. result (39.92) are of the form 


Vif) = og Et z (39.111) 


1 fhfig Minns 


where the v, are functions of population quantities only. In many applications, a 
fairly realistic cost function for three-stage sampling is 


С = c4 Cy ппс + тупп, (39.112) 
where c, is overhead cost and c, is the cost of sampling a single unit at the Ith stage. 
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(39.111-12) are exactly of the form (39.50-1) with w,=n,...n,; it follows from 
(39.53) that we minimize V(Á) for fixed С (ог C for fixed V(/)) by making 


(m ...n)? c vj/ey 1= 1, 2, 3. (39.113) 
Taking ratios of (39.113) with successive values of /, we have 


ng = 73, 9g 02 0, (39.114) 


n, is then determined by (39.114) and whichever of (39.111-12) is fixed. This is a 
notable result, for it implies that later-stage sample sizes are determined by variances 
and costs irrespective of total sample size п, nan, so that if the amount of money available 
(or the estimation precision desired) in a multi-stage survey changes, only л, should 
be changed. This result clearly holds for any number of stages p>2; (39.113) then 
holds for / = 1, 2, . . . , р, and the (p—1) ratios with successive values of / determine 
Tay ть... , My as at (39.114), leaving 7, to be fixed by cost or accuracy considerations 
as above. 


39.52 The result of 39.51 is concerned with the best choice of the (equal) various- 
stage sample sizes for given probabilities of selection, і.е. the sample design is fixed with 
only sample sizes at choice. We may now ask a much more difficult question, follow- 
ing Hansen and Hurwitz (1949): which choice of probabilities of selection will minimize 
sampling variance for fixed cost? In the one-stage case, we saw in 39.8 that if proba- 
bilities л; could be made proportional to the values y; of the variable, sampling variance 
would be identically zero. The multi-stage situation is more complicated, as the 
general variance formula (39.98) indicates, for later-stage probabilities come into the 
reckoning. However, if sampling is with replacement at the first stage, (39.97) is 
replaced by (39.107). Furthermore (as the reader is asked to show in Exercise 39.27) 
if the same set of probabilities is used at each first-stage drawing, use of (39.32) reduces 
(39.107) to 


А х, 1/5) 2 х, 
VO = S ap (E 0P- {2 a EUY «E APIE (EC 


КА ру 2 
=f a E (3 a Ey. (89.115) 
>1 ту \i=1 >1 


i=1 


This depends only оп the aj" at the first stage. 


39.53 We now restrict ourselves to two-stage sampling, using constant-probabilities 
drawings with replacement at the first stage (so that (39.115) holds), and equal- 
probabilities sampling at the second stage, and to a self-weighting design (see 39.41), 
so that the overall probability of selection for every individual in the population is the 
same. This is two-stage p.p.s. sampling as in 39.42, and 


п, п 
ло, Pa _ 


"Na N 
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so that 
N; 
па = гай. ў (39.116) 
We изе а slightly more general cost-function than the two-stage equivalent of (39.112), 
ou в+ща+(® Na)a+(3 ma) (39.117) 
i=1 i=1 


which allows two components of cost at the second stage, one proportional to the total 
size of the first-stage units sampled and the other to the size of sample. (N;,c, may 
be regarded as the cost of “ preparing” the ith first-stage unit for the next stage of 
sampling.) Ifc, = 0 and л» = ms, all i, we return to the form (39.112). Ву (39.116), 
(39.117) may be written 


ce armat E (ne). (39.118) 


However, (39.118) is a random variable which we cannot fix in advance, so we work 
instead with its expectation 


х 
Е(С) = e cc, X aP Na ne 
ii 
X, 
(using (39.33)) which we rewrite, since X ai? = n, by (39.32), 
iei 


N, 
E(C)-co-ne, = X aP (e+ Nac). (89.119) 
i=1 


Because the sample is self-weighting, we know from 39.41 that 
m 05 

í2.Xt--LZyy-m, 
т i=1 = nci s 

so that from (39.116) 
dus Na 9 Мат 

2 T E Aus a came 

nj 787. Nang pal! 7 N ai 


say. Thus (39.115) becomes, in this case, 


N? V) (3 Е (Хт) = Ў 1 E{(Nam)} (89.120) 
nı (#=1>1 im1 WH >1 


39.54 We thus see that the expected cost function is linear in the л(”, while the 
variance is a linear function of their reciprocals. This was exactly the situation at 
(39.50-1), so that the argument of 39.20 holds good with the w, replaced here by the 
aj", the c; by (с,+ Мас») and the v, by {E(Nj2m,)?}. It follows at once from (39.53) 

>1 


that the zi which minimize V(ji) for fixed E(C) are given by 
(E (Na m!) 


DU i= с ДД: 9.1 
(m)? © м 172,505, Ny (39.121) 
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The denominator on the right is the total cost of sampling the ith first-stage unit and 
preparing it for second-stage sampling; the numerator essentially reflects the variability 
within the ith first-stage unit; the result is therefore of the general form one would 
expect. 


Na 
If n; = Ма, Мат = X yg, and if c, = 0, (39.121) then reduces effectively to 
j=1 


the one-stage result: aj" must be proportional to the total of the y; in the ith unit. 
More generally, if Мс; is negligible compared to су, and the m; vary little, we shall 
have 24 ос N;, approximately, so that we sample with p.p.s. In the contrary case, 
when c, is negligible relative to N;,cs we see that if the m; vary little we have aj oc Ni? 
approximately. Many practical situations lie between these limiting cases. 


39.55 The evaluation of the relative efficiencies of multi-stage and one-stage 
random sampling, and even more the estimation of their efficiencies from a multi-stage 
sample, is in general much more complicated than the analogous problem for stratified 
sampling, which was treated in 39.27; Yates (1960) treats a number of special cases. 
It is extremely rare for a multi-stage sample to decrease sampling variance, and indeed 
the motive for multi-stage sampling is almost invariably to reduce costs rather than 
reduce variance directly; the additional resources can, of course, be applied to an 
increase in sample size. The result of 39.51 makes our point clear; there, one-stage 
random sampling is seen to be most efficient only if the solution of (39.114) is that 
n, and п; be as large as possible, і.е. my = Nj, na = М everywhere. Since the №, 
and N;;, are usually themselves very large, such a solution requires very large values 
for the cost and variance ratios in (39.114), which are almost never found in practice. 


39.56 Finally, we mention briefly that the benefits of stratification, which we 
discussed for single-stage sampling in 39.13-27, apply at every stage of a multi-stage 
sample. Practical multi-stage sample designs therefore frequently incorporate strati- 
fication, particularly at the first stage, which often contributes most to the sampling 
variance. All the foregoing theory applies separately within each stratum, including 
the Yates-Durbin rule of 39.46 and 39.49 for estimating variance. 


EXERCISES 


39.1 A simple random sample of л individuals is drawn with replacement from a population 
with N members, and d distinct population members are included in the sample. Show that 
the mean of these d distinct values is an unbiassed estimator of the population mean, and that 
its variance is smaller than that of the overall sample mean for л> 2, and equal to it for n = 2, 


by proving the inequality 
1 1 N-1 
ese) 


(Raj and Khamis, 1958. The same result 
holds if d is fixed and л a random variable.) 
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39.2 Two units are selected from a population of N units without replacement, the prob- 
N 
ability for the ith unit at the first drawing being pi, = pi = 1. The probabilities at the second 


i-i 
drawing are made proportional to 


1 1 
inb; = b; (au 15) 


if the ith unit was selected at the first drawing. 


Show that 
N N 
3i abes D Se, 
= эр) ape 
jal 


that pi(2)pj is symmetric in i and j, and therefore that p; is also the unconditional probability 
that the ith unit is selected at the second drawing. Thus in 39.6, z; = 2p; and 


ү 1 1 te_\-* 
sri Jes) 
(Durbin, 1965; Sampford (1967) extends the method to n>2). 


39.3 In (39.23), show that if the probabilities л; are proportional to the values y; of the 
variable (which are taken to be positive) V(@) = 0. Show that in this case the estimator (39.24) 
of the variance is also equal to zero, but that (39.22) becomes 


lom. ш 
f, - fii у> — -7 
100) = и ч лу Ми 
is. 


where m is the sample mean, Hence show that Р, (й) can take negative values. 


ó 
39.4 In(39.24), show that if луу = луу = N forallj # 1,2,and n = 2 with y;, y; observed, 
amtÓ'—m, aye 
fiy 7:3 53 9)* 01739* 


Hence show that Ў, (й) can take negative values. 
(Durbin, 1953) 


39.5 Show that if sampling without replacement with unequal probabilities is carried out 
so that at the first drawing the ith individual has probability of selection equal to 5; » 0, Xp; = 1, 
i 


while at all subsequent drawings probabilities of selection are equal, we have, in the notation 
of 39.6, 


m = a ao a (Xe) n2, 


м = EQ ON The 0-1), 


and hence that the variance estimator (39.24) is always positive for this selection scheme. 
(A. R. Sen, 1953) 
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39.6 Show that if N>3 and л = 2 and the sampling is carried out with probabilities pi 
at the first drawing as in Exercise 39.5, while the second drawing is carried out with the (N — 1) 
probabilities in the same proportion as at the first drawing, we have 


1 1 
7 = Pipi (PET ET 


1 * Pr 
бйр: ( = 2) 
т+1,ј 
and that the summation in л; is a minimum for fixed i, j when 
fe lo (гру) 

1-5 (N-2-(ü-p-p) 

Hence show that the variance estimator (39.24) is always positive for this selection scheme. 
(A. R. Sen, 1953) 


39.7 For the selection scheme of Exercise 39.5, show that for any set of pi we must have 


m> =, all i, and using (39.16) show that only опе л; at most equals unity. 
39.8 Show that for equal-probabilities random sampling without replacement both (39.22) 
and (39.24) reduce to the estimator of variance Vim) given in (39.12). 


39.9 Show that the z, defined at (39.25) have variance 


Иба) =®рау® йш)... E pu- E W 
а oO (u—1) (м) Ёш) 


м1 
-®рау® реу... E {м-® %), u>2, 
а) (2) (u—1) fel 


where each summation is over all available units at the indicated drawing. Using this result, 
show that for л = 2, the statistic (39.27) has variance 


V® = PIE YO) Ns DE ba) E29 — Y pay (Ми -]. 
4LUo bay о) DPD а) 
(Raj, 1956) 
39.10 Show that if z defined at (39.26) is to reduce to N times the sample mean т when 
sampling is with equal probabilities, the weights must satisfy 
P N«-» 
a (N-2)-» 
these (n—1) conditions, together with Ecu = 1, determining the weights uniquely. 
(cf. Raj, 1956) 


2<u<n, 


39.11 To select a sample of л individuals from a population of N individuals ys, using 
probabilities л, (s = 1,2, .. . , №), consider the following procedure: 
(1) Select a value М > the largest individual 2s, say zmax, and choose a number r between 
0 and M at random with equal probabilities. Select an integer s, by the same process from 
the integers 1 to N. If r «75, accept ys, for the sample; if r5, repeat this entire operation, 
(2) Select further values s; successively without replacement from the integers 1 to N. In 
t 


this sequence accept for the sample every ys, for which the cumulative sum r+ X лу first 
u-2 
exceeds one of the values M, 2M, 3M,..., (n—1)M. 
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1—7umax 


Show that if M< а 
n-i 


replacement. Show that operation (1) repeated n times achieves the required л with replacement. 
(Lahiri, 1951; Grundy, 1954) 


this procedure selects the sample, with the required zs, without 


39.12 ху, х»,..., Xn are uncorrelated variates with the same mean и and variances not 
n 
necessarily equal. Show that = У xi/m is an unbiassed estimator of и, and that its sampling 


i-i 
variance is unbiassedly estimated by X (xi—x)?/ (n(n — 1)). 
i 


39.13 Show that the generalizations of binomial and Poisson sampling discussed in 5.10 
are special cases of stratified random sampling, and hence derive the variances given at (5.26) 
and below (5.27). 


39.14 In 39.15-18, show that 


D = Ё(тк)— Vw) yaw) z М (ши)? 
1 
ата) 


where 
Р = ММУ Моў Œ Nio)!) 20, 


О = n(E Nio? -N Eaj) «0, 
R-N:Eg-(ENa)»0, 


P = 0 holding if and only if all о; are equal. Hence, using (39.46), show that as № — œ 
with N;/N fixed, the relationship (39.49) holds. 
Show further that if P = 0 and n is small enough, D is negative, and that if N—n is also 
small enough, 
(тк) < V(Amy). 
(Armitage, 1947) 


39.15 In Exercise 39.14, show that if the c; are sufficiently unequal, P—R will often be 
positive, and that then D is a decreasing function of n, so that the reduction in variance through 
stratification declines as л increases. Hence show that if any m in the MV allocation exceeds 
the corresponding Ni, we should increase the gain from stratification by putting m = №, and 
distributing only the n—N;, other observations by the MV allocation. 

(Armitage, 1947) 


39.16 Show that for any sample design in which the sample mean m is unbiassed for the 
population mean и, 


15 

E4- X (yi—m)*} = оў—уагт, 
Mi=1 

where [^] = E(y—y)*. Verify the result for random sampling with equal probabilities without 


replacement. 
(The result is due to L. Kish.) 
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39.17 In 39.24, show that if there are А = 2 strata, and the whole of the second stratum 


is sampled (m, = №,), the best choice of a cutting-point c, in the range of y to minimize the 
sampling variance (39.39) is given by 


(AS 
1 1 m 1 


39.18 Show that if sample size n; is to be the same in all strata, the cutting-points in the 
range of y which minimize (39.39) are given (cf. 39.25) by 


Nifo? + (cr—mi)?} = № {о Goma» 


(cf. Dalenius, 1952) 


(Cochran, 1961) 


39.19 A large random sample is drawn from a population and the individuals classified 
into k strata after selection. Show that if we use the stratified estimator of the population 
mean given by (39.38) in these circumstances, the expected value of its variance (taking account 
of the random variation of the stratum sample sizes) is approximately equal to V/(iusr) given 
at (39.45). 


39.20 Show that if stratum sample sizes are chosen to minimize the variance of the estimated 
sampling variance (39.40), and m/N; is negligible, the MV allocation formula (39.42) is replaced 
by 

m LL oU Du 
№ È Mofu) 
П 


where f; is the moment-ratio “,/u3 for the Ith stratum, This allocation will therefore differ 


from (39.42) unless бы is constant, all 7. 
(Ross, 1961) 


39.21 Show that if stratum sample sizes m differ from those defined by the MV allocation 
formulae (39.42) by amounts Am, (Л) is increased by approximately the factor 


1 m X ((An)*/m). 
"з 
39.22 Using 39.2, show that in cluster sampling with equal cluster sizes and a single cluster 


of size n selected, (39.7) gives the sampling variance of the sample mean, p being the intra- 
class correlation coefficient (cf. 26.25-6, Vol. 2) for clusters. 


39.23 Show that the generalizations of binomial and Poisson sampling discussed in 5.11 
are special cases of two-stage sampling, and hence derive the variances at (5.29) and (5.30). 


39.24 Establish (39.94) from (39.92). 
39.25 Deduce (39.105) as a special case of (39.104). 


39.26 Use Exercise 39.12 to derive the result (39.110) for the estimation of variance in 
multi-stage sampling where the first stage is sampled with replacement with the same set of 


unequal probabilities at each drawing. 
(cf. Durbin, 1953 
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39.27 Using (39.32), show that when the same set of probabilities is used at each first- 
stage drawing with replacement, the middle term on the right of (39.107) equals 


1íXN 2 
—-—4 D aP E(t)?» 
СЕ 1 
and hence that the difference between the with-replacement sampling variance (39.107) and 
the without-replacement variance (39.97) is expressible as 


UE. з HN 
pat WE ey -EÙ a E (ti) E (ti), 
m i= >1 ї >l 21 
j 

and that this may be positive or negative. If sampling is with equal probabilities at the first 
stage, show that D > 0. 

Show that if the with-replacement estimator of variance (39.110) is used when sampling is 
„ҮР, so that if with- 

p 

out-replacement sampling has the smaller variance (D > 0), use of the with-replacement estimator 
tends to overestimate the variance. 


actually carried out without replacement, it has bias exactly equal to 


(Durbin, 1953) 


39.28 In a multi-stage design, sampling at the sth stage (sz 1) is with replacement, and 
sampling at the (s--1)th stage is with equal probabilities. Show (cf. Exercise 39.27) that if 
any unit selected r times at the sth stage has its (s-- 1)th-stage sample size multiplied by r, the 
variance of the estimator (39.68) of the population mean is less than if r independent (s+ 1)th- 
stage samples had been selected within the unit. 


39.29 In a multi-stage design, sampling at the sth stage (s>1) is with replacement. Show 
that if any unit selected ғ times at this stage has the (5+ 1)th stage of sampling carried out only 
once within it, and a weight of r given to the results, the variance of the estimator (39.68) of 
the population mean is greater than if ғ independent (s+ 1)th-stage samples had been selected 
within the unit. 


39.30 In sampling with unequal probabilities without replacement of л individuals from a 
population of N, the probability that a given sample of individuals is selected in a particular 
order is pu), and the probability that the same sample is selected in any order is ps = Уро, 

(9) 


the summation being over ће л! possible orderings; X ps = 1, where the summation is over 
" 


the 5) possible saraples. If z(, is a statistic which may take account of the order of selection, 


and zs = X pis) %(s)/Ps, show using (39.72-3) that 
в) 


E(zs) = Е(д)) 
and 
V(zs) < Vizo), 


the last equality holding only when all values of 2.) are the same. Use this result to show that 
Z,) defined at (39.27) can be improved upon as an estimator, and that PE) defined at (39.31) 
can similarly be improved upon. 

(cf. M. N. Murthy (1957) and Pathak (1961a)) 


39.31 In sampling with unequal probabilities without replacement, suppose that the N 
population individuals have probabilities of selection at the first drawing equal to (ур, i = 1, 
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2,..., N, and that at later drawings the probabilities of selection of hitherto undrawn individuals 
remain in the same proportions as at the first drawing. Show that (39.25) may then be written 


EET 
= 
(pay 
5s Уш) Еи 
зи = X yt (- У wpm), u=2,3,...,0, 
r=1 (Pu) r=1 


and hence that the improved version of 2) given by Exercise 39.30 may be written 
" 
# = У ури /рв 
і=1 
where p;i; is the conditional probability of selecting the observed sample given that y; is selected 


first. 
(M. N. Murthy, 1957) 


СНАРТЕК 40 
SAMPLE SURVEY THEORY: SUPPLEMENTARY INFORMATION 


40.1 In Chapter 39, we were concerned with problems of sample design. We 
now turn to a question which arises whatever that design may be, namely the improve- 
ment of the efficiency of estimation. 

In 39.8 and 39.13 we touched upon the fact that knowledge of a variable highly 
correlated with that being studied may assist us to choose probabilities of selection, or 
to construct strata, to make the sampling variance of the estimator small. Such supple- 
mentary information concerning an auxiliary variable may also be used directly to 
change the form of the estimator in order to improve its efficiency. 


Ratio estimators and their modifications 

40.2 Suppose, as in 39.2, that in sampling a finite population with equal probabilities 
without replacement we wish to estimate the population mean of y, which we now 
write ду but that we know the value of the population mean of x, zz, and can observe x, 
as well as y, for the sample values. We clearly ought to be able to turn this extra 
knowledge to good account. We assume y, # 0 # My 

Two intuitively reasonable estimators of ц, are 


Py = pu m/m, (40.1) 
and 

Йу = штуу (40.2) 
where m denotes the sample mean of the variable which is its suffix. (40.1) uses the 
ratio of sample means, and (40.2) the mean of sample ratios, of y and x as a “ correction 


factor " to the known gy. 
The expectations of (40.1-2) follow at once from observing that by the definition 


of a covariance C, 
(т, n) ay Ет) - (or) 


te 
E 
= цу — Е(й,)и 
Me 
so that 
E(fiy) = ш с(®, n). (40.3) 


(ж) This is to be distinguished from the use of supplementary information (an instrumental 
variable), as in 29.33-46, Vol. 2, to achieve identifiability in estimation. It is, however, analogous 
to the use of a concomitant variable in the Analysis of Covariance (cf. 35.67-8) in so far as 
the latter reduces residual variation. 
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Similarly 
J x) = —E(Z 
OEE 


EG) = ња $2) =. %) 


so that 


is 


x „-с(2 2! (40.4) 


> 

(40.3-4) show that both estimators are in general biassed. Furthermore, since a 

covariance between two variables cannot exceed the product of their standard deviations 
(by the Cauchy-Schwarz inequality), we see that 


180) 1 <{(™) rem} 
186-51 <{72)re}. 


(40.5) shows that there is a radical difference between the estimators, since as sample 


(40.5) 


size n —> оо, v(m) and V(m,), variances of sample means, are of order n-1, and so is 
n, 


the bias іп ij; no such effect occurs with Á, since ve and V(x) do not depend on n 


at all. In fact, it is easy to see from their definitions (40.1-2) that ñ, is a consistent 
estimator, since m, —> ii, and m, — ui, ; but that Д, —> и, fy/2, which will not in general 
be equal to м. The bias in fi, is studied in detail in 40.9 below. First, we see how 
the bias in 2, may be removed. 


40.3 From (40.4) it is clear that we only need an unbiassed estimator of e( *) 


to eliminate the bias in Z,. Since y/x is observed for every sample member, we can 
calculate the sample covariance of y/x and x. By the bivariate analogue of (12.109), 
it is the k-statistic А, in the sample which is unbiassed for Kj, in the finite population, 
and thus the unbiassed estimator of the covariance in the population is 


о) = 35.1 E (њот) 


n-li 


x Xi 
N-1 n 
d NT "amt тит). (40.6) 
Thus, from (40.2) and (40.4), an unbiassed estimator of д, is 
ar N-1 n 
A Штув+—уу—.-—ү(т,— тут), (40.7) 
first proposed by Hartley and Ross (1954). If NH is near unity, it reduces to 


у = ту тут. р). (40.8) 
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40.4 The removal of the bias in fi, is less important, since we have seen in 40.2 
that it is consistent, but it is worth considering since ratio estimators are sometimes 
used when z is small. We cannot directly use the device just used in 40.3, for the 


covariance we now need to estimate is eu. n) in (40.3), and a single sample supplies 
only one value of m, and of m,. However, a simple approximation is easily obtained. 
When л = 1, e. n) is identical with с, 2! Moreover, the covariance be- 


tween sample means of jointly distributed variables is inversely proportional to sample 
size (this follows, e.g., from Rule 10 for k-statistics in 12.14, or may easily be proved 
directly), and the same will hold approximately here, where we seek the covariance 
of one mean and the ratio of another to the first mean. "Thus 


tin) 16) 


and using (40.6) we find from (40.3) the approximately unbiassed estimator 
"dE 
Hy += e 2 


N-1 1 
= be et goi ema) (40.9) 


а result differently obtained by Nieto de Pascual (1961). ‘The absence of the factor п in 
the second term of (40.9), compared with (40.7), again illustrates the different orders of 
magnitude of the biases in (40.12). 


40.5 We now have to examine the variances of the alternative modified ratio 
estimators (40.7) and (40.9) as a guide to choosing between them in different circum- 
stances. 

We consider only the case when N — co, so that sampling is effectively simple 
random. Using (40.6), we rewrite (40.7) as 


Ay = be Myx Ёл, (40.10) 
where k,, is the k-statistic of the variables y/x and x. Thus 
V (tay) = и V (my) +24, С(туу Fui) + Vka), (40.11) 
which in the notation of 13.2 is written 
VOB) = pV (my) +24, (3 H + «(! D (40.12) 
Now by 12.14, 
LIN En „ш 
(i 1) == Pn, (40.13) 


using (3.80); while (13.7) and (3.81) give 


11V _ Hee, Июн» _(л—2)и\ 
4i ) = mn-l) n(n-1)' КУЛЫ) 
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where the cumulants and moments in (40.13-14) refer to the joint distribution of y/x 
and x. We therefore rewrite (40.12) in this notation as 


V) = nin 2p n = jme (0 эу}. (40.15) 


(40.15) may be usefully simplified. First we observe that by definition 


#{(#-)- нд} = EE) к) (80-е) na 


= Has Hit 
in the notation of (40.15), which is thus equivalent to 


ar 1 
nV (fay) = Ho изо +2 pox n + #{(#--)® -) Жуст (H20Ho2+ Ий). (40.16) 
Now consider the identity 


„(њи (L7 no) ек) = уњ (40.17) 


If we take the variance of the left-hand side of (40.17), we see that it is exactly the 
first three terms on the right of (40.16). We may therefore replace these by the variance 
of the right-hand side of (40.17), obtaining 


nV (ity) = V(y— Hye) q (зоо +), 


and returning to our original notation, this is 


nV (ii) = У(у)+ нф V8) 2,4 CQ 2)--— aire rese. 3): (40.18) 


a result obtained by Goodman and Hartley (1958). As л — co, the term in braces in 
(40.18) may be neglected and 
nV (jy)  V(y)- у V(x) – 20, CQ, x). (40.19) 

(40. 18) i is most easily estimated by expressing its form (40.12) in terms of cumulants 
and using k-statistics to estimate them—Goodman and Hartley (1958) give computing 
formulae. 

Robson (1957) generalized (40.18), and also its unbiassed estimator, to take account 
of the finiteness of the population. 


40.6 We may similarly obtain the variance of (40.9), which we rewrite analogously 
to (40.10) as 


a my Ry 
By hen n (40.20) 
We see that 


я: 2 1 
(д) = i v(m) +24, e. hs) +F Viku). 
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Just as we did in deriving д; in 40.4, we approximate by writing 
- 
с(, 2 T xe hs). 


Vai) = 12 v(m) + (n e hs) + ы} 


2, 
s aevo) vu ima) (40.21) 


from (40.11). All three variances on the right of (40.21) are of order n~}, so that the 
second term is of relative order n-? compared with the first term on the right. Our 
approximation may therefore be written 


(ду = i v(i} (40.22) 


Since the bias in the unadjusted estimator д, was seen in 40.2 to be of order n~, the 
bias in Ду, is of no greater order, and its square will be of no greater order than n-*. 
Thus the mean-square-error (more appropriate than the variance in view of the biassed- 
ness of йу) may be written 
j' —u, y) = p2V(™ 1 

вей) = (8) 1+0(2)). (40.23) 
A more precise approximation is given by Nieto de Pascual (1961). ‘The leading term 
in (40.22) and (40.23) is the variance of the unmodified estimator ñy, and is easily 
evaluated to order n-! by using (10.17), which here gives 


v(m) zu ar) 4 Y@)_2€(, x) $ (40.24) 


m, nuu и Pyke 


so that 


Thus (40.23) becomes 
nE{(ñy— ш)? ~ Vo) Ve)-2e C(y, x). (40.25) 
z СЛ 
(40.25) may be estimated with slight bias by replacing и,/и, by m,/m, and the variances 


and covariance by their unbiassed estimators. This gives an estimator of the mean- 
square error 


Ё{(д,— n) = x 1) i 0-2 x) 


m, 


40.7 We now compare (40.19) and (40.25). Following Goodman and Hartley 
(1958), the difference may be written 


nE g- -va = VA (E-D - (uu- e!) 40.26) 


(40.26) makes it clear that the modified ratio of means estimator Ду is more, or less, 
efficient than the modified mean of ratios estimator Ду according as the linear regression 


P 
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coefficient of y upon x, f,, = C(y,x)/V(x), is nearer to the population ratio of means 
or to the population mean of ratios, In practice, the former situation seems to be 
more common, and the slightly biassed estimator (40.9) is then preferable to the un- 
biassed (40.7). 
(40.9) is more efficient than the ordinary sample mean estimator m, if and only if 
the right-hand side of (40.25) is less than its first term, i.e. if 
it (е2) <0, ie, Panto ог б„<{#<0, (40.27) 
Ha Ms Ha Hs 
We have thus characterized the efficiency of д,, compared to both ñ; and ту, in terms 
of the relative magnitudes of the regression coefficient f, and the population ratio 
of means u,/u,. Since ду essentially estimates by the sample ratio of means m,/m,, 
this is as we should expect. 


Olkin (1958) generalizes the theory of the unmodified fy to the case where x is a 
vector, See also P.S.R.S. Rao and Mudholkar (1967). 


40.8 The approximately unbiassed estimator (40.9) was obtained by directly 
estimating the bias in д, given by (40.3). We could, alternatively, have reduced the 
order of magnitude of the bias by using Quenouille's method, described in 17.10. This 
would involve the calculation of д, for each of ће n different samples of size (n— 1) 
which exclude a single observation, averaging these m values, and using (17.10) to 
obtain a modified estimator with bias of order n-? and variance unaffected to order n-* 
(cf. Exercise 17.18), as was seen to be the case for ñ, in (40.23). 

Durbin (1959a) used a simpler form of Quenouille's method to modify a general type 
of ratio estimator of form r = t,/t, (which includes (40.1) as a particular case), whose 
bias in estimating E(t,)/E(t,) is assumed to be of order n-!. If the same statistic r 
is calculated for the first ўл and second łn observations (л even) and denoted by ту, 
respectively, the modified estimator is 


t(r) = 2r- Mr rj). (40.28) 


If the regression of £, on t, is linear with constant variance of order n-!, and t, itself 
is normally distributed with variance of order n-1, (40.28) was shown to have bias of 
order n~? and variance which agrees to order n-! with that of ғ but is smaller asympto- 
tically when terms of order n~? and lower аге taken into account, A similar result 
holds when 2, has a Gamma distribution. 


J. N. K. Rao (1965) and J. N. K. Rao and Webster (1966) show that use of Quenouille's 
original method gives even smaller bias and mean-square error in both the normal and 
the Gamma distribution cases. J. N. К. Rao (1967) shows that this estimator has smaller 
mean-square-error than (40.7), (40.9), (40.28) and other estimators. 


This result is more general than at first appears, because t, and t, will usually be 
asymptotically bivariate normally distributed with variances of order n~? by the Central 
Limit theorem—this is certainly true of m, and m, in (40.1)—and the linear regression 
assumption therefore satisfied. It follows that in such a situation there is nothing 
to be lost by bias-elimination using (40.28). 
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A simple numerical example with п = 2, N = 4, discussed by Goodman and Hartley 
(1958), Durbin (1959a) and Nieto de Pascual (1961), gives the results: 


Population values of (x, у): (2, 2), (2, 6), (4, 6) and (6, 10) 


Estimator Equation Mean-square-error 
(д) (40.28), (40.1) 0-38 
л, (40.9) 0-44 
ГА (40.7) 0-56 (40.29) 
й, (40.1) 0:92 
» (40.2) 241 
m, (sample mean) 2:67 


Exercise 40.17 asks the reader to verify these values. 
40.9 If we write (40.1) in the form 


Hy _ My _ Ын (ee »» (40.30) 
DU ВШ Hy re ШЫ 


and expand the negative binomial into a Taylor series, valid with probability 1 as 
Мп — œ, we find on taking expectations 


т\ш 1 1WVV(3) C(»x) 1 

BA) nor ae) аео. (ива 

where M stands for л or N indifferently. Thus the estimator 
=й г} ew 


has the first-order bias of m,/m, removed. In (40.32), the sample variance and со- 
variance are defined with (n—1) as divisor, as usual. 

It is a straightforward, though somewhat tedious, matter to evaluate the mean and 
variance of u, using the results (the first three of which were given at (12.117), (12.119) 
and (12.121), the remainder being derivable by the methods of Example 13.2): 


3 
Етну = (ng) 
E(m,— u,)! = 3a? K3+O(n-), 
E(m,— n;J(s5— Къ) = о, Кы 
Ет, nens) = (s ane 
н (40.33) 
Е(т„—ыь)Җ(т,— uy)? = «(2К% + КК) + O(n), 
E(m,— и.) (тш) = 3o Къ Ki + O(n-*), 
E(si- Ky)(m,— My) = а Ka, 
Е(т„—н)(з,— Куу) = = Kay 
Е(ш,— Ky)(m,— My) = %1 Kin 
Неге а, = (n-' —N-") as at (12.116), and we have dropped the suffix N to E, as 
throughout this and the last chapter. Tin (1965) gives the results to order n-?, 


E(u) = at ite (2 = A Ca oU M NOR Cx} (40.34) 
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2 
Vw) = (2) fa (С»+Сы—2С„)+о}(2С%—4СьС„+С%+СьСы) 


ra A (С»—2С„+сы)}, (40.35) 


where 

Crs = Kr (nzus). 
In a precisely similar way, Tin (1965) gives the results for the simple ratio estimator 
йун = m/m, as 


6) z (EM itaw - a (s 3) (Сы - Cu) +308 Cu (C - С), (40.36) 
v(m) = (9). Cu 2C) +08 (8 C8, - 160 Cu 3-5C3, 4-3 C9, Сы) 

(s) (C204 Cu) (40.37) 
while for (2 m) defined by (40.28), he finds 
Е{{т ү = (ie){1- (Cu - Cu/N - 2s (Cu — Cu) 

NS м) (С с), (40.38) 
(т )} = (&) det Cu -2C,) (5 5 +m) Cu (Cn —2C3) 
(Goto) Qeon 


fS (Ca -2Cs сы}. (40.39) 


These es make it clear that the bias іп и and in ¢(m,/m,) is very small, with 
no term of order п! in either, as opposed to the bias in m,/m,. All three variances 
have the same leading term, which we have already encountered at (40.24), where we 
saw it to be also the mean-square error of 4,/u, defined by (40.9). The reader is 
left to show in Exercise 40.13 that to the next order of approximation, we have 


Viu)< (8) ДЕ) (40.40) 


and thus и defined at (40.32) seems preferable to the other estimators considered here 
on grounds of bias and mean-square error. 


"Тіп (1965) also considers another estimator closely related to u—cf. Exercise 40.14— 
and makes comparisons in bivariate normal and other situations. 


Regression estimators 
40.10 Given that we know the population mean и, of a supplementary variable, 
as in 40.2, it is natural to consider the application of the theory of regression to improve 
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efficiency in the estimation of ш. The simplest form is a linear regression estimator 

Ё, = m,+H(u,—m,)- (40.41) 
This is a generalization of (40.1), to which it reduces if we choose b = m,/m,. How- 
ever, b is usually chosen as the LS regression coefficient of y upon x. 

If the linear model (19.8) holds for the relationship between y and x, the LS theory 
of Chapter 19 and of 28.12 onwards holds; however, we are dealing here with a finite 
population, and in any case x is a random variable in many applications, so that the LS 
theory cannot hold exactly. Instead, we observe that т, and т, are jointly distributed 
with means д, ji, variances V(x)/m, V(y)/n and covariance C(y, x)/n. Thus, from 
(40.41), if we ignore sampling errors in b, however it is chosen, 

nV (ji,) = V(y) +V (a) —26C(9, х), (40.42) 
with unbiassed estimator 
" 
 Ы жЕкЕ (40.43) 
The asymptotic formula below (40.25) for the estimated mean-square error of Ё» 
which is also its asymptotic variance and that of the unmodified Ду, is derivable from 
(40.43) by putting b = m,/m, as above. Under the Central Limit theorem, m, and 
m, will usually be asymptotically normal, and hence /i, is so. 


40.11 (40.42) is only an asymptotic result, because only then is the sampling error 
in b negligible. We see that the regression estimator /í, is more efficient than the 
sample mean estimator m, if the right-hand side of (40.42) is less than its first term, 
ie. if 2bC(y, x) > b* V(x). 

If we choose b as the LS coefficient, which as л — œ tends to C(y, x)/V(x), this 
condition is always satisfied if C(y,x) # 0. Similarly, comparing (40.42) with (40.25), 
we see that the condition for /i, to be more efficient than Ду is 


по (E) «асо, a) (0-и). (40.44) 


If b is the LS coefficient and tends to C(y,x)/V(x), (40.44) reduces asymptotically to 
V(x) (5-3) >0 (40.45) 


ы 
which holds except when b = y,/u, when, as we have seen, (40.41) will reduce to 
(40.1) asymptotically. "There is thus nothing to be lost by using the LS regression 


estimator, at least asymptotically. 


40.12 The regression estimator (40.41) is, of course, biassed. To remove this 
bias, we discuss general methods for constructing unbiassed estimators, due to Mickey 
(1959) and W. H. Williams (1961, 1962), which will also throw light upon our earlier 
problems in ratio estimation. ` 


Unbiassed estimation with a supplementary variable 
40.13 We begin from the observation that, for any constant a, the estimator 


my— a(n, — Hz) (40.46) 
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will be unbiassed for д, but this will not generally be so if a is a statistic calculated 
from the same sample as m,. Suppose now that the sample of n observations is split 
at random into a subsample of р observations and а “ remainder " sample of (n—p) 
observations. (To be precise, we may choose the first р observations in the order of 
drawing as the subsample.) We now use the subsample to determine a in (40.46), 
and calculate the means m,, m, for the remainder sample only; since the remainder 
sample is a random sample from the (N— 5) population members not included in the 
subsample, this will give us an unbiassed estimator of the mean of this '* remainder ” 
population. Moreover, we can express the means in the remainder sample and the 
remainder population in terms of the overall sample and population means and those 
of the subsample, distinguished by an argument (р). Thus (40.46) gives an estimator 
-omm,—pmy,(p) _ ау" Рт (Р)  Nu,—pm.(p) 
с со с; ep}, 

which will be unbiassed for (Nu,— pm, (p))/(N —p). Thus the unbiassed estimator 
of ру itself is ((N— р)и, -- pm, (p))/N, which we write 
tm NEL map) - “у” E O-O on (0) А). (0047) 
The choice of an integer р is arbitrary in 1<p<n—1. For given f, the function а(р) 
of the subsample is also arbitrary. We therefore have a large class of unbiassed 
estimators of д, which make use of our knowledge of y. 

Exactly the same argument holds in the multivariate situation where x is a vector. 


40.14 An undesirable feature of the general class of estimators (40.47) is that 
they depend on the order in which the sample is drawn. We can overcome this by 
considering t, for every one of the л! possible orderings of the sample and averaging 
to obtain Z,—this average sometimes takes a simple form requiring little computation 
from the sample. (This averaging process is exactly the same as we carried out in 
39.11 for similar reasons, although there the results were not computationally simple 
because sampling was with unequal probabilities.) Exercise 39.30, which now simplifies 
since we are sampling with equal probabilities, shows that the averaged estimator Ё, 
has variance which is never greater than that of any single ¢,. 


40.15 If in (40.47) we choose a(p) = m,/,(p) => 5 Е and p = 1, it reduces to 
i=1%; 


= a mim) 


х1 
where y, ху are the values of the first observation drawn. Averaging over all n! order- 
ings of the sample, we obtain 
N-1 n 
АҺ = „ту + Nac (m, — my, m,), (40.48) 
which is identical with Д, defined at (40.7). What is more, if we choose any other 
value of р and the same а(р) as above, the average value £, will be the same as (40.48), 
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though of course 2, itself will differ. Thus if a(p) = m,.(p), we get an unbiassed 
estimator based on Myx. 
This result encourages us to look for an exactly unbiassed version of the type (40.1) 

by putting а(р) = m,(p)/m,(p). (40.47) becomes 

t= D OD в Үн, "щй, 

o5 em) N GAN mp 
which when averaged gives a similar form with m, ( Р)/т.( Р) replaced by its average. 
If p = 1, of course, it reduces to (40.48), since a( pi is then exactly what we had previ- 
ously. The next simplest choice is ^ = n—1, when the average value of m,(p)/m,(p) 


over all permutations is seen to bel X E (mn БА ‘) = R. Thus 
Nin \NM,— Xj, 
N- n 
Hany ae WG wep m P2 (40.49) 


is an unbiassed estimator of д,. 


40.16 Turning now to regression estimators, it is natural to investigate the choice 
а(р) = b, (p). (40.37) reduces to 
tp = my—b, (p) (т. Hz) 
_(N=n)_? =. E 
т кезу ome) m)-h. (y. ()-m). — (030) 
Averaging simply replaces b,,(p) by its average. p = 1 is now impossible, since b, 
is then nugatory. As before, р = n—1 is the next simplest, involving the calculation 
of the regression coefficient т times, omitting each of the observations in turn, and 
averaging to obtain 5,,(n—1). (40.50) reduces to 


MEETS Ne (E hon Dub n - n, (40.51) 


where x; in the summation is the value omitted in calculating the 5,,(n— 1) which it 
multiplies. (40.51) is equivalent to the usual regression estimator (40.41) if all 
b,,(n—1) are the same, but not in general otherwise. However, when n is large, the 
b,,(n—1) can vary very little, and the estimators differ correspondingly little. 


Estimation of variance 
40.17 The sampling variance of the unbiassed estimators (40.47) cannot be gener- 
ally investigated, since everything depends upon the choice of a(p). However, if we 
modify the estimation scheme slightly, we can at once obtain estimators whose variance 
can be estimated. 
Suppose that the л observations are split into k subsamples as they are drawn, 
k 
the rth subsample containing л, observations, X n, = п. We write the partial sum 
ral 
X п, = nag SO that л = т and m,, = m. We re-label the estimator (40.47) as 
r=1 
Қр, п) to signify that a subsample of р is used in a sample of size n. Consider the 
sequence of (k— 1) estimators t( 41,742), t(49,%43) . - . (пулу так), in which each 
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estimator uses the complete sample of the previous estimator as subsample. These 
estimators are all uncorrelated, for the operation of taking expectations may be split 
into a sequence of k conditional expectations (cf. 39.35) corresponding to the divisions 
into the k subsamples, and for r<s 
Ет nicin) път) 
= Е... Ettintigsy) Е... ЕҚпата))) 
1 rl "+2 k 

= E. Е d ппс) My} = y 
Thus we may use the result of Exercise 39.12 to estimate the variance of the mean 
of the sequence of unbiassed uncorrelated estimators (лп). The estimators 
themselves need not be of the same form. 


40.18 In 40.17 we did not require that the k subsamples be of equal size. We 
now suppose that they are, so that л, = n/k. If we use each of the subsamples in 


turn to evaluate a particular a(p) = (8) in (40.47), and calculate (> n) each time, 


its k values will no longer be uncorrelated аз in 40.17. Their mean is 
п 12 /n 
(n ») - i (5 n) 


e its EA) cos 


where a bar denotes averaging over the k values obtained. ‘The first two terms on the 


right of (40.52) are precisely of the form (40.46), but are not unbiassed because a(z) 


is calculated from the same sample аз т„; their expectation is n- c(t). }. 


The last term in (40.52) is evidently an unbiassed estimator of this covariance if the 


population is regarded as consisting of ББ groups of size n/k, of which k are selected 


at random for the sample. 


Modification of sampling scheme to eliminate bias 

40.19 The whole of our discussion of ratio and regression estimators in this 
chapter has been concerned with modifying the form of the estimator to eliminate or 
reduce bias in equal-probabilities sampling without replacement. Another way of 
achieving these objectives is to change the sampling scheme so that the original estimators 
are rendered unbiassed. 


(39.20) shows that an unbiassed estimator of д, is given by ЕЗ 5 2 for any set of 
i217 
selection probabilities л;. Suppose, then, that we choose 
х 
лт y. Ss (40.53) 


i= 
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where we must now assume the auxiliary variable x to be positive so that (40.53) is a 
set of probabilities. It then follows at once that и, m,,, will be an unbiassed estimator 
of у. In other words, for this sampling scheme, ji, defined at (40.2) is exactly un- 
biassed. On the other hand, if we regard the set of m ) possible samples as the 
population from which one member is to be drawn, and work in terms of the means 
my, T, of these samples, the same argument shows that if we make the single selection 


with " 
se ено / (Ys). — 


the estimator (39.20) becomes 


(m/f a} т" 


so that ji, defined at (40.1) becomes exactly unbiassed, as first observed by Lahiri (1951). 
The variances of these estimators, and unbiassed estimates of them, are obtainable 
as usual from (39.23-4). In the case of ji,, of course, at least two samples (or random 
subsamples of one sample) are necessary for variance estimation to be possible. 
Nanjamma et al. (1959) discuss the general problem of modifying the sampling 
scheme to render ratio estimators unbiassed, with applications to several types of survey 
design. See also Pathak (1964a). 


Stratified and multi-stage sampling 

40.20 Any ratio or regression estimator may be applied separately within each 
of a number of strata, provided that the population mean of x is known within each 
stratum. Alternatively, a single ratio or regression estimator may be applied using 
the combined results from all strata. We should expect the former procedure to be 
the more efficient in general. The details are given by Cochran (1963) for biassed 
ratio and regression estimators. 

Unbiassed stratified ratio estimators are discussed by Nieto de Pascual (1961) 
and W. Н. Williams (1961) in the univariate case and by Olkin (1958) for multivariate 
situations. Robson and Vithayasai (1961) consider a stratification-like situation where 
y and x can be expressed as the sum of k corresponding components. Kish and Hess 
(1959) derive asymptotic formulae for the variance of the biassed combined ratio 
estimator in stratified multi-stage sampling. 


40.21 Durbin (1953) pointed out that since, from (40.30), 


my p m-(%) = | 
— = + zl + o(n-4), 40.55 
т Ue Mr pns ( ) 
the ratio of sample means is asymptotically linear in y — (иу/и,)х = 2, so that we have 
Py y а. (40.56) 


m, M, Mie 


224 THE ADVANCED THEORY OF STATISTICS 


It follows that for the estimation of v(m) in multi-stage sampling, the discussion of 


39.45-50 applies asymptotically and the Yates-Durbin rule of 39.46 and 39.49 may 
be used in large samples. This will also be true separately within each stratum of a 
multi-stage design. The same applies for regression estimators. 


Two-phase sampling 

40.22 Our discussions of stratified sampling (39.15-27), and of the use of an 
auxiliary variable to improve estimation efficiency in this chapter, both presupposed 
some knowledge of the population in order to make unbiassed estimation possible. In 
the latter case, it was и, which had to be known, and in the former the relative sizes of 
the strata, N;/N, which are required to define the estimator (39.38). If this essential 
information is not available, a procedure which sometimes suggests itself on practical 
grounds is to carry out a preliminary equal-probabilities random sample to obtain it, 
and follow this by the main sample devoted to the original purpose of estimating the 
population mean. Clearly, such a procedure will only be economically acceptable if 
the cost of the preliminary sample is small relative to the gain in efficiency achieved in 
the main sample as a result—we make this point more precise later. 

A sampling scheme of this kind is called two-phase sampling. (We do not use the 
older name double sampling, which has already been used in Chapter 34 for a sequential 
method whose aim is to achieve a confidence interval of prescribed length and coefficient. 
This is certainly not our purpose here, where we aim primarily to improve efficiency 
of estimation at the second phase by collecting auxiliary information at the first phase.) 
‘Two-phase sampling is distinguished from two-stage sampling by the fact that it uses 
the same sampling units at each phase of sampling. 


40.23 Following Neyman (1938), who first solved the problem, we consider the 
stratification problem first. We wish to stratify into a fixed number Ё of defined strata, 
but are ignorant of the population proportions N;/N = W, in these strata and therefore 
cannot use the estimator (39.38). Accordingly we take a preliminary equal-probabilities 
sample of size m,, which is found to be distributed over the А desired strata with fre- 


k 
quencies лү, 755, . . . , Mz, Where X m, = т. The proportions to; = m,,/m, are, of 
1-1 


course, unbiassed estimators of the population proportions W, and it is therefore 
natural to use as our estimator of ш 


k 
Ёз = à 201173, (40.57) 


where the т; are the sample stratum means іп the second (main) sample, of size 7, 
with лур observations in the АҺ stratum. The question now arises whether 75; is to 
be a subsample of л; or whether it is to be entirely independently selected. In practice, 
the former is much more likely, since if N, is unknown no complete listing of the strata 
can be available and second-phase sampling will be based on the random sample in 
each stratum obtained at the first phase. Furthermore, although the first phase is 
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logically prior to the second, the two phases can sometimes be carried out simultaneously 
if the second uses a subsample of the first. 

We assume that the л are certain to be so large, in relation to the intended values 
of the Nz, that the observed values of the лу are no restriction upon the fixing of the 
пц in advance. 


40.24 We now (cf. 39.35) use the symbol E to denote taking expectations at the 
second phase, conditional upon the first-phase кеш being fixed, and Е to denote 


expectations at the first phase. Using (39.72), the expectation of (40. 57) 18 
E(f) = Е {Е (3) = ; БЕ {vu E @т„)} 


= ZE {шш} = ®И = p (40.58) 
so that Ду; is unbiassed. Its variance, from (39.73), is 
(м) = E {V (Ды)}+ 4 {Е(Аһ)}- (40.59) 


The first term on the right of (40.59) is evaluated by регио that just as at (39.39) 
= == Mal 
V (fas) = V Qe, ma) айл (1 zan), 
and hence 
" 2 Y рү? ei "u 
E( (3) = X £6) (1-3). (40.60) 
If we now assume each №, to be very large, the last factor on the right of (40.60) is 
negligible. What is more, the first-phase sampling is now effectively multinomial 
estimation of proportions, so that (5.80) applies and (40.60) becomes 
E (V (a9) = {70—00 UM wi) È, (40.61) 
2l 
The second term on the right of (40.59) is, using (10.16) and (5.80), 


V (E(f)) = VS иш) = Epi V (wu) + EX uin, C (Win Wip) 
1 2 i ГТ Ip 1 


Ш 
W.-W) W, 
Se DEB L^ 
5 ш m + lt » 
1#р 
1 2 
SER uL Wipi (E Wu). (40.62) 
1 
Putting (40.61-2) into (40.59), we find, since E 3 Wipi= = 
(1—-И/, 
ы) = x (0009, ни) иаи). (40.63) 
т nu тт 


It will be recalled from (39.46) апа 39.19 that the last term on the right of (40.63) 
expresses the gain in precision of a USF stratified sample over an unstratified sample 
when all stratum sizes are large and л; is sample size. 
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40.25 If m, — co, so that the W, are effectively known, (40.63) reduces to the 
usual stratified sampling variance formula (39.39). Even for moderate m,, the term 
W,(1— W;)/n, which cannot exceed 1/(4n,), will usually be negligible compared with 
Wi. From 39.18, we recall that Z Иши)? = a Wo; for №, large, so that 


(40.63) may be approximated by 


о%—У Wo} o 
V(fas) = — = —+ EW; = (40.64) 


An almost unbiassed estimator of (40.64) is obtained by substituting toj; for W, s? for 
о?, and sẹ for oj. 


40.26 Suppose now that the cost function for the two-phase sample is 
k 
С = с+тсү+ 2 Natar (40.65) 
=1 


(40.64-5) are of the form (39.50-1). It follows from (39.53) that the sample sizes 
which minimize V(Á,,) for fixed C (or vice versa) are 


с%-У Иа 
а (40.66) 


Wio? 
wi, cc Woh 


ni oc 


Cat 

the constant of proportionality being obtained by (40.65) or (40.64), whichever is fixed. 

(40.66) shows that at the second phase, observations should be distributed between 
the strata just as in ordinary stratified sampling allocation at (39.55) (though it must 
be remembered that only the neglect of a term of order мг! has produced this simple 
result). The first-phase sample size is directly proportional to the numerator of the 
first term on the right of (40.64) (which is the excess variance resulting from the need 
to estimate the WW, at the first phase) and inversely proportional to the cost of sampling, 
both considerations in accord with intuition. 


40.27 Although the intention of our two-phase sampling is to improve estimation 
efficiency by use of stratification, we recall from 39.18 that even when the W, are 
known precisely, the best stratification may cause a loss of efficiency, though we saw 
in 39.19 that this could not happen if all the N, were large enough. However, the 
additional component of sampling variance due to the estimation of the W, at the 
first-phase sampling now opens the possibility of a loss of efficiency even for large N;. 

Consider an equal-probabilities unstratified random sample of size m, with ж 
observations falling into the Ith stratum. It is reasonable to assume that the overhead 
cost of the sample will be the same c, as in (40.65), and that the cost of an observation 
in the th stratum will also remain unchanged at c, Thus the cost function is 


Ср = „+®т су. 
1 
However, n; is now a random variable with expectation nW, so that we must work 


with the expected cost 
E(Cp) = 44 nEW,e,. (40.67) 
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If E(Cp) is to equal C at (40.65), we must have 


T, Cy + Упис 
A= i 


СЫЙ сы 
D 
and thus the variance of the unstratified estimator will be (for large N) 
о? SM Wica 
Vi cc mie ir 40.68 
(mx) n mert туси ( ) 


The ratio of two-phase stratified to unstratified sampling variance is, from (40.64) 
and (40.68), 
(“= Ио 
У(Длз) = my 
(тк) o* X Иса 
The numerator of (40.69) is the product V(/;) (C—c;), which is minimized when m, 
and m, are chosen to satisfy (40.66). Ву (39.52), this minimum value is given by 


2 
+2 W} (л,с,+®лщсы) 
y (40.69) 


min Vår) _ (02-Х Ио) с) осу]? 
ЖЫ), a oE Wica E (M) 


"This seems to be the most useful form for the ratio of variances. If we again consider 

the numerator (40.70), we see by the Cauchy inequality that it is no greater than 
(c? -E Ио?) + ИЛ оў} {с,+® Иси) 

so that (40.70) gives 


min (Дз) < e 
(т) slt "Wa (40.71) 
Thus if c, = 0, two-phase stratified sampling with MV allocation of sample sizes is 
never worse than unstratified sampling with the same expected cost. But if с, = 0, 
we can estimate the W, accurately at zero cost, so this is effectively ordinary stratified 
sampling. We have thus verified the conclusion of 39.19 with the additional con- 
sideration of variable costs in the different strata. 

If c,>0 in (40.70), it is possible for the unstratified sample to be more efficient, 
but (40.70) is evidently an increasing function of с, and if c, is small compared to 
the weighted average X И, су, the right-hand side of (40.71) can exceed unity by very 
little, so that there is, at worst, little efficiency to be lost by properly allocated two- 
phase stratified sampling. As a simple unfavourable numerical example, put c, = 1, 
сы = 6, all J, о® = 10, of = 6, all J. The value of (40.70) is then (2+6)*/(10 x6) = 1-07. 
If instead cy, = 9, (40.70) becomes (1--7:35)?/(10 x9) = 0-77. 


40.28 When the first phase of sampling is being used to estimate the mean of an 
auxiliary variable, и, for a ratio or regression estimator at the second phase, (39.73) is used 
to evaluate the sampling variance, as in 40.24. We shall consider only the simplest 
case, using the biassed ratio estimator (40.1). In two-phase sampling this is 


(D 


PERET e 
Ёа = mP.mp my, 
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where we now use superscripts to denote phases. If the two phases are independent 
equal-probabilities samples, we have, using 40.2, 


Еа) = E (т E (my) m?) = Е {тә [и„/„+ O(nz*)) 


= p+ O(n-). 
(39.73) gives (Дв) = E (V 053) + y (E053) 
= Е (то) V (mimi 2)4 И {mP ри), (40.72) 


where we neglect terms of een order n;'. Thus 


Vas) = V (mj /m?) (ry + + (er) (т), 
1 (2, 
and using (40. eia this becomes 


Vein) + (vens (уув) zo (14762) (8) 99. олз) 
n uz „ы т 
The term in 1/n, on the right of s oen is simply (40.25) applied to the second-phase 
sample. As іп (40.63), the first-phase sampling introduces a term in 1/m inflating 
this contribution, as well as a new contribution of order 1/m. Since we have already 
neglected terms of relative order 1/m, we also (since we assume л; » n) neglect the 
term in 1/n,n,, obtaining the approximation 


(ды) = x ( V()4- (sy у(х) ч су, 3» (y TR (40.74) 


If, instead of being independent, the second-phase sample is a subsample of the first, 

(40.72) is modified, и„/д„ there being replaced by m@/m®P, so that the second term 

on its right-hand side becomes simply V(mp) = V(y)/ nı If m, is very much larger 
1 


than 7,, the first term on the right of (40.72) has the same value as previously to our 
order of approximation, but if n/n, is appreciable, the approximation is improved 


1 


(дь) = e д IU d (sy уб) + 200), »} + "o ) — (4075) 


by a correction ( -) applied to the first term in (40.74), which then becomes 


40.29 Cochran (1963) and Yates (1960) give details of application of two-phase 
methods to regression estimators, although restrictive assumptions are necessary to 
obtain useful variance formulae. 

‘Two-phase sampling generalizes naturally to multi-phase sampling, but little theoretical 
or practical work has been carried out on this more general procedure. 

S. Р. Ghosh (1963a) considers a form of two-phase sampling where the object of 
the first phase is to form clusters for the second-phase sampling. 

Raj (1964) discusses the case where the first phase is used to determine probabilities 
of selection to be used at the second phase, and M. P. Singh (1967) shows that the estimator 
(39.27) is preferable to some others in this situation. 
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Domains of study 

40.30 In our discussions of ratio estimation, we have allowed the auxiliary variable, 
x, to be quite general. In practice, one of the most important situations is that in 
which x is a 0-1 variable which counts whether the corresponding value of y is not, 
or is, a member of a certain sub-group of the population, Among the situations falling 
into this class are the following: 


(a) A population is sampled, but we are from the outset only interested in part of it; 
e.g. the human population aged 21 and over is sampled, but we are interested only 
in the ages 21-65. Here the sample size л for the population of interest is clearly 
a random variable, and the sample mean for any variable measured in this population 


is of the form X y;/ Ж X; where x; їз 0 or 1 as above. If the population mean of x 
i=l i=l 


(i.e. the proportion of the population aged 21-65) is known, all our foregoing theory 
can be applied. 

(b) We are interested in the entire population from which we sample, but only part 
of the selected sample yields observations, owing to non-response (in human popula- 
tions, especially), loss of records, or incomplete fieldwork. Again, n is a random 
variable, and the remarks under (a) apply. 

(c) We obtain observations from the whole sample taken from the population of interest, 
but we wish to evaluate the results for sub-groups of this population; e.g. we have 
a sample from a human population, and wish to calculate certain statistics for men 
only and for women only. If we had stratified the sample in advance into men and 
women, no new point would have arisen, since sample sizes for men and for women 
would be fixed. However, such stratification is not usually possible, so that these 
sample sizes are random variables (though their sum is not, in this simple case). 
More generally, the sample size in any unpredesignated sub-group must be a random 
variable. 


40.31 The sub-group of interest is called a domain (of study). Of course, a stratum 
may itself be a domain, but no new theory is then required. We shall use domain to 
mean a sub-group whose sample frequency is a random variable, whatever the reason. 

Domains frequently cut across the strata and the various stage-units of a sample, 
and it is here that new points arise. Yates (1960) gives (as also in earlier editions of 
his book) a number of formulae for domains cutting across strata, for which Cochran 
(1963) gives some of the derivations. Durbin (1958) treats these and some multi-stage 
situations. Hartley (1959) also derives some of Yates’ results for covariances of domain 
means, and gives some further results. Our treatment follows Durbin (1958). 


Domains across strata 
40.32 Suppose, as in Chapter 39, that we have k strata with population frequencies 
N, 1 =1,2,...,k and EN; = N, while in the sample the stratum frequencies are 
[ 


(*) There is a complication here in the case of non-response, since non-response may be 
correlated with the value of y, so that the responding group cannot provide an unbiassed estimator 
of y for the population as a whole. 
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n, with Ez, = n. Now consider a particular domain, say d. We denote the population 
ice in d in the /th stratum by № with B Nj? = N® for the domain frequency 
in the entire population, while nj? and x nf? = п® are similarly defined in the sample. 


Note that whereas the population stratum frequencies N, are known, the population 
stratum domain frequencies Nj? will generally not be known. Of course, unstratified 


random sampling is the case Ё = 1. 
We define the variable у to be equal to the observed variable y for domain mem- 


bers, and to be zero for others. "Thus 


IP = hy dys (40.76) 
where 
_ [+1 within the domain, 
Ju —4 0 outside the domain. (40.77) 


We then have 


х k k M 
N? = Thy № = У № == У hy 
j=l 1-1 1-1 j=1 
"2 k kom (40.78) 
пі = Ў hyp пә = Ў тщ? = > У hy 
ј=1 1=1 1=1 j=l 
We further define the domain means within strata, 
M 
pe = > yP/NP, (40.79) 
3=1 
and the overall domain mean 
k M 
ие =® LyiP/N@ = E NP ujh/Ne. (40.80) 
1-1 j=] 1 


40.33 We now seek to estimate и'® at (40.80). Consider first the case where the 
sampling is with equal probabilities using a USF, say f = n/N as in Chapter 39. 
The ratio estimator 

m^ = X Ў у?/п® = SD уФ/> Б hy (40.81) 
1 jeY 1j lj 
is the sample analogue of и'®. It is in essence (40.1) with numerator and denominator 
separately summed across strata—i.e. it is an example of the “combined” ratio 
estimator referred to in 40.20. Using the analogues of (40.3) and (40.5), we see that 
to the first order of approximation 


- (и) т) 
= р, (40.82) 


using (40.78) and (40.80). 


40.34 То find the variance of т®, we put 
zy = hy(yy—H) = yP hye, (40.83) 
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so that, using (40.78) and (40.81), 


=p = XX synt. (40.84) 
(40.84) is asymptotically zh 
m-pa EE zy/E(n®) = z E zu/ (» W) (40.85) 
Thus 
(т) ~ (a) у(х B zo): (40.86) 
Nw nij-ci 


"The variance on the right of (40.86) is that of a mean in a USF stratified sample. By 
(39.45), therefore, we have 


N-n 
Ven) ~ (sa) үү 2 Niet) 
2 
_ N-a i e rig 
= 9079): 2 Р lj N, | (40.87) 


where we have ignored the fact that оў (2) has N,—1 as divisor rather than №. From 
(40.83), (40.87) may be written 


= XA О 
Ку z V(m^) ~ X "E {hy Yy- p’) e- {2 d Ы Sell 


- z [$ mou- weeg wn (40.88) 


using (40.78-9) and the fact that hj; = hy. The effect of the term hy in the first 
summation over j on the right of (40.88) is to convert the уу in the succeeding parenthesis 
to y? by (40.76), and to leave д there unchanged, since by (40.80) this is a linear 
function of the yj? and A; yf? = yP. This first summation may therefore be written 


M 
X hy(yy-u®) = E (yP и)? 
j=1 j 
= bey еца, (ut? — ^y, (40.89) 


by the usual sum of squares identity. Putting (40.89) into (40.88), we obtain 


Vim) ~ N-n У Ў ( а — іа) + М gain (uh — и)? (40.90) 
aN 1 Li Ju —M ANE М, 1-Е . . 


40.35 The first term on the right of (40.90) is, if we write n/N = n/N to 


our order of approximation, 
NE з 40.91 
ауа)» Х AS = y?y}. (40.91) 
As is evident from our derivation of (40.87) from (39.45), (40.91) is exactly the “ vari- 
ance" we should have arrived at if the stratum domain frequencies n? had been 
(wrongly) regarded as fixed. The second term in (40.90) therefore indicates the 
increase in variance attributable to variation in the mj?. This will be large if the 
Q 
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stratum domain means y? differ substantially, particularly if the fractions of the strata 
frequencies within the domain, Nj®/N,, are small. 


(à) 
If we drop the factor (1-5 in the second term on the right of (40.90), and 


1 

use the approximation in (40.91), the whole of (40.90) is to our degree of approximation 
identical with the first formula of 39.18, which will be seen to be the unstratified sampling 
variance. Thus if all the fractions N/?/N; are small, we see that the variation in the 
n\® effectively removes the whole benefit of stratification from the sampling variance 
of the estimator. Only if the domain bulks large within at least some of the strata is 
much of the benefit retained. 

An estimator of the sampling variance (40.90) can be derived in exactly the same 
way as (40.90) itself was—the details are left to the reader as Exercise 40.15. 


40.36 If the sampling fraction is not uniform, the estimator (40.81) must be 
changed to weight the stratum contributions properly. Instead, we put 


fi NS X Ju z № Eni, (40.92) 


The reader is asked to show in Exercise 40.16 ах this is asymptotically unbiassed 
for u/^, with variance 


Ve) ~ 


M 
Na. mis р y\* 
wa ao v(t н) ES (s- A , (40.93) 
the generalization of (40.87), which reduces on substitution for zy to 
V(r) = PLN = Mi (1-31) [X (уф — uj?)* 
m NJ 19 74 
Nj? 
ane (1-76) u-u] — озю 
which generalizes (40.90). An estimator of (40.94) is 
— (у № E a NÈ (1-4 È id midya 
КОЕ (z A PCR 1 Ny Lii (17 — тін) 
а 
m ( - E (ті? — m) |; (40.95) 
1 


the derivation again being left to Exercise 40.16. 


Domains in multi-stage sampling 

40.37 We shall confine our attention to multi-stage sampling in which s first-stage 
units are selected with replacement from S units. Any number (including zero) of 
stages of selection is permitted thereafter, but we restrict ourselves further to self- 
weighting designs (cf. 39.41), for which the sample mean is the estimator of the corre- 
sponding population mean p. 

We wish to estimate the overall domain mean, written и'® as before, where 


8 
не = Е Е...®уў. „Мв (40.96) 
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and N® is the total domain frequency as before, 
s s 
Deo e LU „»= У М, (40.97) 
i=1 


i=1 j 


the hy.» being 0-1 variables as at (40. 77). 


40.38 The estimator is the sample analogue of uj?, 


т® = X E... EYD, „те = ®ур/п®, (40.98) 
i=1 j P f=1 

where пе = Ў EXER =È n. (40.99) 
i=l j Р {=1 


As at (40.82), to the first order of approximation, 
Ете) ~ (È X D.E.. Тен 
i= 
=/®®...®уф.»//®Е " 


2 

= u^, (40.100) 
the common factor / іп numerator and denominator being the overall probability of 
selection for each value y in the population, 


40.39 Just as at (40.83), we define 


20...> = IE... 2 Ay... ts (40.101) 
and find as at (40.84) that 
IE 
si E n^ a p а 
1 D 
= au qe xeu E Bg oec: (40.102) 
Thus, proceeding immediately to the estimation of the variance of m^, we have 
"уң, Sas" 
Рт) tre ш "e (40.103) 
where 2 is the mean of the s values Be E 2g...» = 3p Say. Now from (39.109), 
r= A m 


and in particular, if the probabilities of selection are the same at each first-stage drawing, 
(39.110) gives in this case, with t; = ч, 


(z) = PG) - 
Thus (40.103) becomes 
Р(т®) ~ E 


X= ie uU 


SC Èe- = (40.104) 
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Just as at (40.87-8), we now resolve the sum of squares in (40.104), using (40.101), into 
» (2;—3) = У 22—52° 
к * (®)? 
= bi (y — nj? u?)*— = (т®— 42, (40.105) 


using the definitions of yj? and n{® іп (40.98-9). (40.104-5) give, assuming that 
п® ~ Е(п®), 


(? — nð uy A an (m® = uy, (40.106) 


Рт) 6 (s— Dea Eo 


and the expectation of the second term on the right of (40.106) is — L V(m®) to our 


order of approximation. Thus, taking it to the left-hand side, we find 
Ut 
Pmt) ~ cs 3 О-у. (40.107) 
(40.107) is still not a statistic, as it depends on д'®. To our order of approximation 
this may be replaced by m, so that finally we obtain the estimator 


ШИР 

Р(т®) = {єз È Ont moy. (40.108) 
If there is no sampling after the first stage, (40.108) agrees to this order of approximation 
with the result of Exercise 40.15. 


40.40 If we had (wrongly) taken the n{® to be fixed, we should have found for 
the estimated “ variance " of (40.98), from (39.110) with t; = y/?/n'^, 
1 ri Li n^ 2 
TPA (e-7 E (ош) 
Comparison of (40.108) and (40.109) shows that the variation in the n{® affects the vari- 
ance by replacing the average domain frequency in a first-stage unit, m/s, by the 
individual first-stage unit domain frequencies, n{®. The increase in variance will be 
large only if the nj? vary substantially and if they are negatively correlated with the y/^, 
which is unlikely in practice. 


EXERCISES 
40.1 Show from (40.7) and (40.9) that 
и 
Say m/m my) 


is an approximately unbiassed estimator of ду. 
(cf. M. N. Murthy and Nanjamma, 1959) 


40.2 Show that the product estimator 


йу = тутаг Гил, 
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analogous to the ratio estimator (40.1), has bias C(smy,mz)/uz and hence that 
Ls [no omms (Xm) b эин) оче 
i-i 


is unbiassed. Show further that, as № —> 00, 


1 al ( Vo) , V) 20) 1 (шш: Co, 23) 

V(ày) = - x 

ы Hy ра [Я $ [Л taai Hy He 

Hence show that if C(y, x) «0, Ду is more efficient than Ду, whose mean-square-error is given 


at (40.25); and that (cf. (40.27)) д'у is more efficient than the sample mean ту if flyz < —4 E «0 
z 


or Byz» -4 а >0. (cf. Robson, 1957) 
5 


40.3 In inverse sampling (cf. Examples 9.13, 34.1) of a population of N individuals with 
unequal probabilities and with replacement, sampling continues until (r +1) distinct individuals 
have been selected, when (п +1) observations will have been made (л >”). The last observation 

r 


is ignored, leaving ғ distinct values yi, observed m times respectively, with У m = m. 
1 


i=1 
r 
У тух їз ап unbiassed estimator of the population mean, and (cf. Exercise 
i=1 
39.12) that its variance is unbiassedly estimated by 


1 
Pw) = TEI m(yi— t)’. 
(Sampford (1962); Pathak (1964b) improves the estimator—cf. 39.5.) 


Show that = 


40.4 Show that if & strata of fixed sizes N; are formed by random subdivision of a population 
of size N, and then n observations are sampled with equal probabilities without replacement 
using a USF, the variance of the estimator (39.38) over the entire procedure exactly equals that 
of the mean of an unstratified sample of the same size from the original population, 

Show further that if any allocation but a USF is used in the strata, the variance of (39.38) 
is increased. 


40.5 A sampling design, with one or more stages, selects n first-stage units with replacement 
N 


from a single stratum of N units, using unequal probabilities р, | E pr = 1 | at each drawing, 
i-i 

and subsequent stages are sampled independently within the selected first-stage units. This 

design is modified by first dividing the N first-stage units at random into » groups containing 

N,, Ns ..., Nn units, and selecting one unit from each group with the same relative proba- 

bilities as in the original scheme, 5;/5;, where ри, is the sum of the ру in the ith group, Sub- 
n 

sequent stages remain unchanged. Show that ta (2) = У pizi has the same expectation in 
i=1 


the modified design as has (ш) = 1 р) zi in the original design, and that 


i=1 
( X Ni-N) 
V(tu(2)) = ni Ио (а) 


N(N-1) 
N- E м) 
i=1 


*"CNN-D0 


[component of V(t;(zp!)) due to stages after the first]. 
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40.6 In Exercise 40.5, show that if the N; are chosen to be equal, tm (2) is never less efficient 
than (ж) for single-stage sampling, while it is given the maximum chance of being no less 
efficient in multi-stage sampling (cf. Exercise 40.4). 

(The results in this and the previous two exercises are given by Stuart (1964), 
who develops a generalization of the modified sampling scheme; see also 
J. N. K. Rao et al. (1962).) 


40.7 In an equal-probabilities random sample of л individuals from a population of N, 
only m, provide the information requested about the variable y. ОҒ the remaining п-т; = n; 
non-respondents, 1 in every Ё are subsampled with equal probabilities, giving a subsample of 
size т, = m/k. The cost function of the sample is 

C = соп+ сіт, + сул. 
The estimator of jy is 


fy Ln mem m), 


where т, т, are the means of the respondent sample and the subsample of non-respondents 
respectively. Show, using (39.72-3), that fy is unbiassed, with sampling variance 


ES ic as 1) 
von) = (1 ze ул, 


where o? is the population variance of y and oj is the population variance among potential non- 
respondents in the population, who form a proportion W, of the population. Show, using 
39.20, that (Ду) is minimized for fixed expected total cost if we choose А to satisfy 
goo act Wn 
МУ dicetaü ӘУ 
(cf. Hansen and Hurwitz, 1946) 


40.8 In Exercise 40.7, show that if k = 1, so that non-respondents are fully sampled, if 
a? = ой and № —- ©, and a sample is taken with the same expected total cost as the MV sample 
with А = Аму, then its estimation efficiency is given by 
2 


W,(1—W,)\ 1— 


1 kuy, 
Visk =1) Wa+0-W)/kiy ^ 
Show that if W, = 0-6 and Күү <3, the efficiency with k = 1294 per cent, illustrating the 
relative insensitivity of efficiency to departures from Аму. 


(Durbin, 1954) 


40.9 А very large population is sampled with equal probabilities on two successive occasions. 
Its variance с? is the same on both occasions, and there is correlation о between the values on 
the two occasions. The first sample is of size n, with mean т. The second sample retains 
a fraction f of the first sample (with mean mi on the first occasion, m, on the second occasion) 
and replaces the remaining n(1—f) members of the first sample (with mean mj) by a fresh 
sample of the same size, with mean ту. Two independent estimators of the population 
mean on the second occasion are m; and 


Ay, = mi ba (тт), 
a two-phase regression estimator based on the observed regression coefficient 5,, of the second- 
occasion values upon the first-occasion values in the nf retained observations. Show that 


sete 
ИО) = ,, (1—fle*}, 
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and hence that the MV linear combination of Ду, and mj is 
f а-0@—-(@-/5о%) „, 


^-u-ü-pwey"* d-d-fw) ""' 
with sampling variance 
Via) = zl Er 
1-@-0*%* 


40.10 In Exercise 40.9, show further that the change in the population mean between the 
two occasions may be estimated by 
d, = тут 
ог 
d, = m -m 
which are independent. Show that the MV linear combination of d, and d, is 


1 аслас 
“= ул шшр” 
with sampling variance 
„2 (10) 
O+ ma-a- 


(Yates (1960); Patterson (1950) treats the theory for several occasions; Vos 
(1964) gives variance formulae for simultaneous sampling in time and space.) 


40.11 А simple random sample of size n is drawn from an infinite population consisting 
of k strata containing fractions p; of the population, the achieved sample in the /th stratum 
being т, Xm =n. The intended stratum sample sizes аге mı, fixed in advance of sampling, 

1 


so a supplementary simple random sample of size m:—m is taken independently within each 
stratum for which m>m. If the cost per sample member in the initial sample is c, and the 
cost per sample member in the supplementary sample within the /th stratum is cic, while 
the value of a “surplus ” member in the Ith stratum (in the initial sample) is ci <c, show that 
the expected cost of achieving the intended stratified sample is 


E(C) = nc-- X [Prob {т <m} E(mi—m | т < mi)ei Prob (m2 mi) E(mi — т [mz mid], 
1 


and that if the m are large enough this is approximately 


гая a. ^ тпр 
EO + ves ад [ен er xu 


mi—npi " 
+{тр(1 -oef ds a + z (m —npi)ci» 


where G is the standardized normal d.fr. and g its f.f. Show that if n is increased by unity, 
the change in E(C) is 
AE(c) = c—X [Prob (m «mi) (ci — c) + ci] b: 


and that the least value of n for which AE(C)>0 approximately satisfies 


yfed E 
т {е (m«m) = 1, 
1 
reducing to 
LeipiProb (m«m) = с 
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when c; = 0 (i.e. when surplus observations are valueless in any stratum), and to 
" 
є—с| 


Prob (m < m} = CPU 


when all c; are equal, all cj are equal, and all 5; = H 


(Johnson (1957); Young (1961) considers a related sequential scheme 
in which the whole population is sampled until every әт is achieved.) 


40.12 Using (40.33), verify the expressions (40.44-9) for the expected values and variances 
of the three statistics discussed in 40.9. 


40.13 Form the differences қ") = ЖЕ) and v{.(™)} V(u) in 40.9, and show 
mr, Mr, mz, 


that these are positive. 
(Tin, 1965) 


40.14 In 40.9, show that the estimator 


E 


1 zy 
ET ^ me 


has to order n~? the expected value 


EQ) = m i- (204) (ба-бы-%4сыс-с} 


and the same variance аз u defined at (40.32), so that b and и are virtually equivalent. 
(Tin (1965)—the estimator is due to E. M. L. Beale.) 
40.15 Show that (40.90) may be estimated by 
(dy = (N-n) d m Ў (d туз п E d) „Фуз 
POY Ж Р m emen tar сена 
where mj? is the sample analogue of uj^. 
(Durbin, 1958) 


40.16 In 40.36, establish (40.94), the generalization of (40.90), and (40.95), generalizing 
Exercise 40.15. 
(Durbin, 1958) 


40.17 Verify the numerical values for the mean-square error of the six estimators in (40.29). 


СНАРТЕК 41 
MULTIVARIATE DISTRIBUTION THEORY 


41.1 In a broad sense nearly the whole of this volume is devoted to multivariate 
analysis; that is to say, to the analysis of systems in which each member bears the 
values of more than one measured or classificatory variable. Up to the present, how- 
ever, we have usually managed to simplify the problems which arise: either (as for 
sample surveys) being concerned primarily with the estimation of one particular para- 
meter such as a mean, or (as in experimental design) arranging our regressor variables 
so that estimators of regression coefficients are orthogonal and allow of isolation of 
individual classification effects. We must now go further and consider systems of 
greater generality in which the variables are interdependent. In the present chapter 
we discuss some of the distributional problems which arise. Unless the contrary is 
stated, the underlying distributions will be assumed to be multivariate normal. Itis 
an unfortunate feature of this branch of the subject that, in other cases, very little is 
known about exact distribution theory. 


41.2 In Chapter 15, Vol. 1, we wrote the p-dimensional multivariate normal 
distribution in two forms: 


Z3 ху UV Xe и) ft ау 
geo epf- 2j2i E an (22 о; \( Op j=1 0; 


—ю&ху< o, j= 1, 2,..., р (41.1) 
dF = birar- Je - p) ак) TL dy (41.2) 


where и, оў are the mean and variance of the jth ае and @ is a matrix inverse to 
the dispersion matrix. 

We were not, in that chapter, much concerned with sampling problems, but we 
shall now require to distinguish between parent and sample values, or between para- 
meters and estimators. We shall accordingly write aj, for the sample value of «jj, 
and cj, for the sample covariance whose parent value is уд, so that 

Vik = Pik % Tks (41.3) 
Сук = Tjj Se (41.4) 

The dispersion matrix which, in Chapter 15, we wrote as V will now be written y, 

so that we have 


@ = ү-!. (41.5) 
We recall from (15.15) (Vol. 1) that the characteristic function of (41.2) is given by 
H(t) = exp (— àt yt) exp (it p). (41.6) 


41.3 A sample of л values, typified by ху, l = 1, 2, . . . , n, will yield a likelihood 
function which is the product of л terms of type (41.1), and its logarithm will be the 
239 
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sum of л terms. Denoting by S summation over sample, and by X summation over 
variables, we find 


s РЕТ. = sx" (&- м) 216 (41.7) 
k-190; Ok 
leading to 
Жүз лус 
® “ка, д) = 0. (41.8) 


Since а is not degenerate, the р equations of this type are equivalent to 
Йь= õp ЁЕ=1,2,...,ф. (41.9) 
It is no surprise to find that the sample mean is the ML estimator of the parent mean. 


41.4 For the parameter aj, we have 


їз alel S(x, x, 
2— - - =h) = 0. 41.10 
i 45(ху— uj) (9 — 1) (41.10) 
If Ay, is the co-factor of aj, in | «|, we find on substituting for и the corresponding &, 
1 
42/191 = aS бу— лу) m) = Ce (41.11) 


It follows) that 
Pie = Cie (41.12) 
In particular, the sample variances are ML estimators of the parent variances, and we 
also have for the correlations 
^ 5(х,— Z) (x. 89) 
=r, 41.13 
fn а (5, S) Sa ды 
This applies when all the parameters are under estimate. We shall not be concerned 
with other cases, which are of very minor practical importance, but see Examples 
18.14-15 (Vol. 2) and Exercise 18.14 for the bivariate case. 


41.5 In setting confidence intervals for these parameters we encounter the same 
difficulties as in the univariate case, requiring distributions of the “ Student" or у? 
type. We also have a new problem, that of setting simultaneous confidence intervals 
to the components of a vector. Consider, for example, the estimation of means when 
parent dispersions are known. 

We saw in Chapter 15 that the variables in (41.1) could, by a linear transformation, 
be reduced to independent normal variables with unit variances. It follows that 

(x—p)'a(x—p) 
s distributed as у? with р degrees of freedom. We shall show shortly (41.6) that the 


(*) This is not, perhaps, immediately obvious. We appeal to a theorem, easily proved (cf. 8.9), 
that if фу, ¢2,..., фт are functions of parameters 0;, 0a, . . . , Om, then ML estimators of the ¢’s 
are obtained by substituting in the functional relations the ML estimators of the 6’s. 

It also follows that the ML estimators of partial and multiple correlations are given by the 
corresponding sample statistics. 
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distribution of means in multivariate normal samples is the same as the distribution 
of the original variables, except that the dispersion matrix is a/m. It follows, in view 


of (41.5), that 
n(&-u) y^ &-u) (41.14) 
is distributed as у? with р d.fr. Thus, to a given probability level P we may make 
assertions of the type 
Prob (n(&— u) y! (&-9)2 73} = Р. (41.15) 
Since we are assuming ү known, this sets up a confidence region in the form of a quadric 
in p dimensions. "The practical interpretation of the result requires delicate handling. 
We shall consider questions of bias in estimation in the next chapter. For the present 
we are concerned with distributions. 
Wishart's distribution 
41.6 We now proceed to investigate the joint distribution of means and dispersions 
in multivariate normal variation. Suppose we have a random sample of z individuals. 
Writing ху for the Ith observation on the jth variate, we may array the matrix of 
observations as 
Xin My 08585 Min 


х = (ху) = = а ; EE 5 Xan |. (41.16) 


Xpo Хр .+.› pn 
The frequency distribution of the sample is then given by 
- |æ |" п р n 
ar = езер {35 È каби) ы-и) fi de (nm 


We already know from Example 11.7, and 16.25, in Vol. 1 that for р = 1, 2, the 
distribution can be split into two independent distributions, one of means and the other 
of dispersions. We prove, first of all, that the same is true of any value of p. We have 
the familiar algebraic identity 
3 E бук (ji Hy) Qa Me) = S E а(х Š) (ха 7 3) 

+n оду) (8—0). (41.18) 
Thus the exponent іп (41.17) factorizes into two components. We now have to соп- 


sider the differential elements. It will be convenient to make on each variable an 
orthogonal Helmert transformation of the type used in Example 11.3, Vol. 1: 


a= Jae 


d= pt2) 
(41.19) 


жа = pp EtA -o taalaa) 


1. 
Mn = arr КТЕ ee +) = Vini, 
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where the suffixes are sample labels. The inverse relationship is 


1 1 1 1 
TO Teepe d 

1 1 1 1 
шс putos CS) em sal (41.20) 
xy, = | ; z nl, a 
i vinn-D) va” 


|J] = 1 and the differential element in (41.17) becomes simply 


П Пау (4121) 


1=1 j=1 
Looking now to (41.18) and remembering that y, = 4/m& we see that the second 
factor is a function only of y, (j = 1,2,...,p). The first factor, in virtue of (41.20), 
depends only on y, ..., у. 
Hence we may factorize off from the original density element the second term in 
(41.18) and an associated differential element in the #. With an appropriate adjust- 
ment to the constant factor we then obtain for the means 


ar = DEL exp Cd E s G7 1) 8-1) Ш дуня). (4122) 


The joint distribution of means is, in fact, the same as that of the original variables, 
apart from the factor in n. 


41.7 "Тһе distribution of the sample variances and covariances is thus confined to 
the (n— 1)-spaces orthogonal to the sample means. Since the orthogonal transformation 
is simply a rotation of axes, it leaves distances and angles invariant. Since variances 
and covariances are functions of these alone (cf. Example 11.7 and 16.24), they too are 
invariant. "The non-differential part arising from (41.18) and (41.22) may be written 


| æ [i7 
f= Qaem exp (—4n E 9%). (41.23) 
Our principal problem is to evaluate the differential element іп terms of the Cites 
Write 
uj = Yn- y (41.24) 
Then the covariance су, is given by 
пс = S ий. (41.25) 


Note that here and throughout we изе л as the sample number, not the degrees of 
freedom of the dispersions. We require that 
п—1 2 р. 

Generalizing the argument of 16.24, Vol. 1, we take р flat spaces of n— 1 dimensions, 
one for each u, and let the sample points be represented by DAP Ош.» Wo 
shall consider in turn the variation in P,, then that of P, for fixed P}, then that of Р, for 
fixed Р, and P, and so on. We shall then multiply all these together to find the variation 
[OE TUS ЖА А ЖЬ 
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Consider then P, given P,, P, ..., Pa- Let O be the origin and imagine the 
spaces superposed on one another. For fixed OP, and fixed angles P,,OP;, 
P,, OP, . . » , P, OP, ., the point P,, varies on a hypersphere of n-m dimensions. 
Let the length of the perpendicular line from P,, on to the hyperplane determined 
by О, P, ..., P,., be tm. Then the “content” (volume) of variation permissible 
to P, is that of the “ surface” of a hypersphere in n—m dimensions with radius tj, 
which is) 

Z2ain-m mni 

T(n-m) ` 

We require to multiply this by the element of variation perpendicular to the hyper- 

sphere. Consider the transformation in the m-space determined by О, P, .. ., P, 
based on (41.25), 


(41.26) 


Eng = ты = Sat (ET ETE (41.27) 
The Jacobian of the transformation is given by 
Mir Mind о АА 
ie Que. ы) | йы, Uam +> tam (41.28) 


tss eset) 1| гъ eme Бей 

Zimi, Zumas » + + 20| 

and this is equal to 2z,, where p is the volume of the parallelotope (the m-dimensional 

parallelogram) determined by О, P;,..., Pm, Thus the differential element is 
DNI (41.29) 


On multiplication by (41.26) the total variation of Pj, is then given by 


д\"-т") pice m 


: 41.30 

гт), ка m op 

But we have =", (41.31) 
Um-1 

and, from (41.27), | Eng | = | thas |? = 08 (41.32) 


The element of variation (41.30) then becomes 
ашта Мыс, 41.33 
Ta mya em "5 
We now multiply expressions of type (41.33) for m = 1, 2, . . . , p. The terms in v 
cancel except for those in c, (which is unity) and v, and we find 


Ap(2n—p—1) m 
E n-p =й Hi ap. (41.34) 
fi rüe-i 
From (41.32) we have o = | ép l = т |с 1. (41.35) 


(*) Cf, Kendall, M. G., A Course in the Geometry of n Dimensions, р. 42. 
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Putting (41.34-5) and (41.23) together, we then have for the distribution of dispersions 


1gyrn—1) 489—1) i(n—p—2) 
ар = Gene Ышш И (фео Жз Й deg, — (4136) 
gd*0-» П (3(n-j)) VA 
pum 


This is Wishart’s (1928) distribution. (11.41) in Example 11.7 is the case p = 1; 
(16.54) is a simple transformation of the case p = 2. 

Readers who find the geometrical line of proof hard to follow may prefer to consult 
the review of alternative proofs in Wishart’s article (1948). From many points of 
view the distribution may be regarded as the generalization of the у? distribution to 
p-dimensional variation. 


41.8 Let us note some minor but not unimportant details concerning the distribu- 
tion: 

(a) In the exponent of (41.36) we are summing over all j, А and, since aj, = aj, 
Cik = Cy, the terms occur in pairs except when j = А. Thus there are p terms of 
type «уусу and $p(p—1) of type «лас. For example, with р = 2 the exponent is 

— ац сц + 2233 Cra + 923 Caa). 

(b) In the differential element there are 4p(p+1) terms, not p*. Thus, for p = 2 the 
element is dc dey, deag. 

(c) The domain of variation for each с, is 0 to co, but we cannot easily specify those 
for the other c's, which are conditioned by the fact that the matrix (c,,) must be 
positive definite. ‘This makes it extremely difficult to integrate out certain variables 
с in order to obtain the marginal distributions of others. 

(d) We have defined the sample variances and covariances by dividing the appropriate 
product-sum by п. We may, if we prefer, divide by n— 1, in which case appropriate 
adjustments have to be made in (41.36). The reader should watch this point in 
consulting the literature, because usage varies. 


41.9 We can now derive the characteristic function of the Wishart distribution. 
Writing a single integral sign to denote integration over the domain of c's, we have from 


(41.36) 
[Leer exp (—4n E apen) Пась = k | a 707 (41.37) 


where k is some constant. If we replace 

az Ьу «4—20,/n, Wy Бу Wu-0u/n j*h, 
the exponent under the integral sign gives us the c.f. of the c's with бу, for the usual 
imaginary dummy variable itj. Making the substitutions on the right in (41.37), and 
adjusting the constant to make ф unity for zero 0, we have 


40) = Jer 


931—20s/n, «ӊ—быт, ..., ›—буу/п | 7? 


6n nt; a Daft ees ag ба| (41.38) 


[Rp 7 p/n, «з—бы/п, ..., "om 
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This substitutional device, avoiding the problem of actually integrating over the domain 
of variation, is a useful one which we shall employ again. 


Example 41.1 
In the bivariate case (p = 2) we have, in the usual notation (with unit variances), 
= (1 $} nnn 
Eo 
; Pe р? 1—р?| 
e-| 5 PL ар (1-9-2. 
| lont ica 


Thus (41.36) reduces to 
dF = (ny s$-582-*(1—72)6-9 
-ped gi P(3(n-1) Г (и -2)} 
d Gum pu т) aCe) ae) ds). 
On using the duplication formula 


raa- PAn—-2) = 7762 


we find 


dF = ml acta t(1- 12) zd 


уе) 7 sati) 


х exp ness 56-7 Drs sis det as) drs) 
(41.39) 
which reduces to the form found in 16.26. 
Example 41.2 


Let us now consider the moments of the distribution of the covariance when p = 2. 
From (41.38) we have, putting б, = 0 = 0, 


1 Р Ра) 
1-р? 1^ pon 
Km ^, oy : (41.40) 
1-р? m 1—р? | 
Expanding and evaluating the constant from the consideration that 4(0) = 1 we find 
2р0, (1-р?) б} 17» 
HO) = 1-2 зид ‘| (4141) 
Taking logarithms and evaluating e з= of 0,, we find 
-1 
es — p (41.42) 


s= 2 
a= (1+ ) (41.43) 
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“= D3 +04 (41.44) 


e = 1) бра ре). (41.45) 


In standard measure the distribution tends to normality аз л tends to infinity. But 
for finite n 


225 rro 
B= ei) (41.46) 
РЯ 6 1+6p?+p* 
В, = сабт dep (41.47) 


Thus, even for p = 0, the distribution, though symmetrical, is not normal. We have 
derived these results differently in Example 13.3, Vol. 1. 

Wishart (1929) gave the formulae explicitly for p<8 as far as those of the fourth 
order. 


The additive property of Wishart distributions 

41.10 One property of the Wishart distribution, analogous to that of ° in one 
dimension, is worth noticing. 

Suppose we have two samples, т, and п, in number, from the same multivariate 
distribution. If we pool them, of course, the dispersions of the total sample will 
follow the Wishart distribution for a sample of л+л. But we may also consider 
the joint distribution of the dispersions from each sample. If we form a new dispersion 
matrix by adding corresponding dispersions, i.e. 

ty = ФФ (41.48) 
where the superfixes refer to the first and second sample, then е cjxs are distributed 
in Wishart's form with n,+n,—1 instead of m. 

This is perhaps most easily seen from the characteristic function. If we adjoin 

the distributions of c and c? it will be clear, as in (41.37), that the c.f. of c itself is 


of the same form as (41.38). 


41.11 It would add a pleasant completeness to the sampling theory of dispersions 
if we could proceed from the Wishart distribution and deduce the distribution of 
particular functions of the variances and covariances by integrating over appropriate 
domains to eliminate unwanted variables. Unfortunately this is prohibitively compli- 
cated in general. As in Example 41.2 we can derive moments and product-moments 
of sample dispersions; finding the explicit distributions is then usually unnecessary. 
There are, however, a few cases in which we can proceed further. 


Moments of the dispersion determinant (generalized variance) 
41.12 Consider the distribution of the dispersion determinant |c|. From 
(41.36), the integral being taken over the variances and covariances, 


п 1 Hn—p—2) 
Gere a Oe exp (— Fn E aep) Й dey = 1. (5149) 
nix?» П T (3(n-j)) iss 
ј=1 
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Put nay, = Bj. We have then 


Hn—1) | c |}(n—p—2) > Д 
[1885200 exp (BE Anca) паь = 20-9 Й га). (189 
Replace n by n+2t and divide by |£ |t. We then have 

49-1) 1n—p-2)-t ip(n—1)+pt р 

Ів е0 [| 2 ; 
|! юне «Р CC 82 Васа) des = гу AEE Wes 0) 


If we now replace Bj, by ло, and divide by 217"-? II P(3(n—7)) we have on the left 
the expectation of | c|'. "Thus 


н HP Ae@-)+9 


Е(1с19 = £L (41.82) 
UIS iraa- 
eter (41.53) 


n” ja TDn-j) 
From this we can determine as many moments or cumulants as we wish. Again 
the substitutional device, obviating the awkward integrations, is to be noted. 


41.13 Опе consequence of (41.53) is noteworthy. If we write dj, for the sums 
of products about the mean, so that dj, = пс, we have 


Те)  ГЧ@—/)+1} 
E(N = on i 1809-08. 41.54 
(75) - A E ad 
Now a g? variable with v degrees of freedom has moments about zero given by 


] T(brt) 
=% 5 
Hr Ie 
The right-hand side of (41.54) is the product of p such factors with v = n— 1, 
n—2,...,n—p. Remembering that the moment of a product of independent variables 


is the product of their moments, we see that | d |/ | y | can be represented as the product 
of p independent factors, distributed as 7? with n—1, n—2,..., n—p degrees of 
freedom. 


Example 41.3 
When р = 1 we have the familiar result that a sum of squares, standardized by 


division by a parent variance, is distributed as y? with n—1 d.fr. 
For р = 2 we find from (41.54) for the moments of | d|/| y | 


;  Fíl(g- 1) 8 T((n-2)1 on (41.55) 


A^77Tüe-1) TAa- 


From the duplication formula for the Gamma function, 


1T(2x 
Г(х)Г(х+) = 2 eh ); (41.56) 
we reduce this to 
‚ _ I(n-24-21) 


w= тасу (41.57) 
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This is the (2¢)th moment of a Gamma variable with parameter (n — 2) —сЁ. Exercise 4.6, 
Vol. 1. Thus the tth moment of 2 | d |/| у | is equal to the (2¢)th moment of a д? 
variable with 2(n—2) d.fr. This is a special case of the result of Exercise 11.9, Vol. 1. 


The correlation determinant 

41.14 There is considerable interest in a study of the joint distribution of sample 
correlations. In the general case (non-vanishing parent correlations) distributions are 
complicated, even for p = 2—cf. 16.26-7. We can make some progress in the null 
case, i.e. when all parental correlations are zero. The Wishart distribution (41.36) then 


reduces to 
@лун®-® eem 
dF = airo nr ™ E 2 ty 2 йс. (41.58) 


Transforming to new variables by equations of the Run 


Cj. = 5357 (41.59) 
we find that the Jacobian is given by 
Ј= 2» Üg. (41.60) 
j=1 


This is independent of the ту, as is the exponent in (41.58), and consequently terms 
in s can be factorized off, leaving us with the distribution of correlations 


ш esd ea =1}F П 2, (41.61) 
жи fi Ги) ^ 
where | r | is the sample correlation didus and the constant has been adjusted 
so that the frequency function integrates to unity. We are again hindered from making 


progress by the boundaries of the domain of variation. In the manner of 41.12, how- 
ever, we can find the moments of |r | itself. Т mc are 


[P 30-1)» ES гаи 


E|r|' =; (41.62) 
(0—1) - £p й Tans} 
Example 41.4 
Writing L = log |r |, » = 3(n—1), m = 1(j—1) we have from (41.62) 
Кен) = ü TEOJ Te—m+t) (41.63) 


Pa T@+) I(v-m) ` 


This was shown to be true for integral ? only, but by analytic continuation may be 
shown to be generally true for all £. We may therefore interpret ¢ as an imaginary 
number, and (41.63) then gives us the characteristic function of L. For the corre- 
sponding cumulant-generating function we then have 


y= & flog I(r) +log (v —m-- t)—log T(v + t) – log Г(>›— т)). (41.64) 
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We recall the expansion 


ый. x d 
log T(x) = (x—3) log x—2--3 log (2л)+ 15 56-3 


For large v, substitution in (41.64) gives us for the term in braces 


ERECTUS (41.65) 


= MEE dmn ee pmi] + Ор) (41.66) 
Now put m = }(j—1) and sum from j = 1 to р. We find 
2 
v= -ipp-Di-o( 5) «40-0 7,0079. (41.67) 
‘Thus 
E (L) = —}0(p-1)/7+ 00), (41.68) 
var L = 3p(p—1)/r?+O(v-9). (41.69) 
Furthermore we have 
k —1)-2(k-2)! 
(£) log T(x) = CD 0-9 aurae, > 1. (41.70) 
Hence, from (41.64), to the greatest power in v, 


d'y E MT 1 ! essi] 
a. - ico 2) ges exar). 


-=$ (luem ee where m = }(j—1) 


= (—2)-# 2:3 (k— 1) 39(p— 1). (41.71) 


Comparison with equation (16.4), Volume 1, shows that, with a suitable choice of 
origin, —2» log | r | is asymptotically distributed as 7? with }p(p—1) degrees of free- 
dom. То order »-! the origin, from (41.68), is seen to be zero. 

Thus —(z—1)log|r| is asymptotically distributed as g? with }p(p—1) d.fr. 
Bartlett (1951) gave a slightly more refined result, namely that 


— {n—4(2p + )log|r| 


is y? with }p(p—1) d.fr. The extra term derives from an allowance for the terms of 
order n-! in the mean, but in practice this is a refinement of minor importance. 


Example 41.5 

It is interesting to compare the results of 41.13, concerning the dispersion deter- 
minant, with those of the previous example for the correlation determinant in the null 
case. 

Without loss of generality, suppose that the parent variances are unity and the 
parent correlations zero. Then, from 41.13, 

n? |e] = (n) nd) . nd) |r| (41.72) 

is the product of р independent у? variables with n—1, n—2..., n—p degrees 
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of freedom, whereas —(n—1)log|r| is asymptotically 7? with 4p(p—1) degrees of 
freedom. 

Now from (27.61) applied to the sample instead of the population, we have in our 
present notation 


r 
1-Е...» = m (41.73) 
where Rye..,p) is the multiple correlation coefficient of x, on xs, xs, . . . , x, and Ry, 


is the cofactor of гүү (= 1) in the correlation determinant. By repeated applications 
of this formula we have, on re-ordering suffixes, 

{1- Rios... o) (1-Rp-raz... 0-9) ++. {1— Казу} {1— К) = |r] (41.74) 
where Rž, is the same as the zero order correlation 75. Moreover, all the x’s are inde- 
pendent, so all the factors on the left in (41.74) are independent (cf. 27.30). 

The distribution of U = 1— А°, based on a sample of л and д variables, is (from 
fene (1— Uy» Uie» 
е Г (n—q—2) 
BAU- 30-9) 7 p 
Thus |r | is distributed exactly as the product of (p—1) independent Beta variables 
with parameters (3(n—4), 4(g—1)), 4 = 2,3, ..., p- 

By the same kind of argument as we used in the previous example we can find 
the moments of U and hence the characteristic function of log U. In fact, —(m—1) log U 
is found to be distributed approximately as 7? with q—1 degrees of freedom. 

Thus —(n—1)log|r| is approximately the sum of р independent у? factors with 
p—1, p—2,..., 1 degrees of freedom, namely as a y? with }p(p—1) d.fr. This 
checks with the result we obtained in the previous example. 

The ratio | r |/R; is also equal (cf. (27.34)) to 21... p/s}. In Ry, the corresponding 
ratio is equal to s3.3,..p/s3, and so on. Thus from (41.72-4) 


m|e| = {si.2...p} (23...) + + + (05). (41.76) 
The sums of squares of type msj are residuals which are all independent, and are 
based on n—p, n—p+1, . . . , n—1 degrees of freedom. Thus л? | с | is the product 


of independent 7? factors with those d.fr., confirming the results of 41.13. 


Hotelling’s T* 
41.15 We proceed to derive a generalization, due to Hotelling, of “ Student's " t. 
As in 41.13, we write dj, = пс. Let (Бу) be the inverse of (4). Define 


Tt = щп—1), i Dass (41.77) 
When p = 1 we have dy = п D; = 1/(ns}) 
T= n(n—1) * en: (n—1)#* =? 
тї s : 


which illustrates how Т reduces to “ Student's " ¢ in the univariate case. 
Let mj, denote the sums of squares about the origin so that 
ту = dj +nk; Xp. (41.78) 
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The determinant | my,| may be written 
1 Xn Xn АТУ „п | 
0 dye  duaknXjXQ, ... dytni,%, 
0 4ң+пх,Х, datnē ... day+nigk,|- (41.79) 
0 du-cnX,QX, 4Фй„+т5, ... „+ 
Subtracting #,4/n times the first row from the second, 3,4/n times the first row from 
the third, and so on, we find 


E 1. yn... Rm 
—8,Vn dy dy, 
-ESno dg ... 4 |+ (41.80) 


Evan dua ... d, 
Expand by the border row and column. We find 
Р 
| ту = | dix | ut Zi aD yeh 8 14,1. (41.81) 
From (41.77) it then follows that 


[4,1 _ 1 
[mal = ITa- (41.82) 


41.16 Consider the geometrical interpretation of this result. In the case р = 1 
the numerator and denominator of (41.82) reduce to d,, and my, that is to say, to the 
squares of the distances from the sample point Р, (in the n-dimensional sample space) 
to its projection on the unit vector whose direction cosines are all equal, and from Р, 
to the origin О respectively. The ratio is the square of the sine of the angle between 
OP, and the unit vector. This was the geometrical approach which gave us 
“ Student's ” distribution in Example 11.8 (Vol. 1). 

In the general case, consider the p superposed sample spaces discussed in the 
derivation of Wishart's distribution in 41.7. From a relation similar to that of (41.32) 
we see that | элу, | is the square of the volume of a parallelotope (generalized parallelo- 
gram) with one corner at the origin and sides parallel to ОР,, ОР, . . . , OP,. 

Further, if H is a hyperplane perpendicular to the unit vector meeting it in О”, it 
is easy to see that the projections of P, P}, . . . , Р, on to Н, say P, Ps ... ВИ 
are such that d,, represents sums of squares and cross-products іп H referred to О' 
as origin. Thus | dj,| is the square of the volume of a parallelotope in Н. Thus 
the ratio of | dj, | to | mj, | is the square of the cosine of the angle between Н and the 
unit vector. If the angle is 0 we then have 


1 


їжте) 7 cos? 6, (41.83) 


Now if the sample points Р are distributed at random in the n-spaces, the hyper- 
plane which they determine is distributed at random in regard to the angle which it 
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makes with a fixed vector, and in particular the unit vector. The sampling distribution 
of 0 is then that of an angle between a fixed vector and a random hyperplane. But 
this is the same as the distribution of the angle between a fixed hyperplane and a random 
vector, And this, from a slightly different viewpoint, is the problem of distribution 
which we solved in connection with the multiple correlation coefficient R, for we saw 
(27.26, Vol. 2) that R can be regarded as the sine of the angle between a residual variable 
represented by ху»... and the space containing the other variables x,, . . . , Xp, and 
when the former is independent of the latter we can regard one of them as fixed. Thus 
we may write 
1 

14- T?/(n—1) 

where we must remember that in the distribution of R?, namely 


SUE: (41.84) 


dF yt — R3)i-2-2 (R29 qR?, (41.85) 


1 

B(n—p),3p—1) 
р is the total number of variables and the variate values are measured from their means 
in forming the regression equation. In applying (41.85) to Hotelling's distribution 
we must increase р by unity, for we are effectively considering +1 variables—the 
unit vector being the extra one; and we must increase л by unity because our variation 
is not restricted to that about the mean. Making these adjustments and substituting 
in (41.85) from (41.84), we find for the distribution of T? 


e 1 LU Ga M pal p 
4 а рур (re Tre n) ac) e 
Equivalently, we may say that 


mn has ап F(p,n—p) distribution, (41.87) 


As for the Wishart distribution in 41.7, we require n—1>p for the validity of the 
above theory, which may be used in the obvious way to test a hypothetical vector of 
means o, by measuring x from this as origin and then using the distribution (41.87). 


41.17 The same result may be derived by using the substitutional device of 41.12. 
Starting from the Wishart distribution, we note that if product-moments about the 
origin, say c', are used instead of those about means, the distribution remains valid 
except that there is an extra degree of freedom. We then have 


_ Gon |a] in еә 
7 ato ПТ +17) 


As in (41.53) we find 


dF exp (—1n E aj, 65.) Пасу. (41.88) 


27 n Füen-1-j)n0 


R= f REIP (41.89) 
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Now we may also write (41.88) in the form given by our original derivation of Wishart’s 
distribution 


dF = Marge m exp (73 E Pir egy) Пасу 


x exp (-} E Amã) dë. (41.90) 


If we multiply this by | с | and integrate we obtain the form on the right in (41.89). 
Replace n by n+2u and divide by |}A|*. We then have, on division by appropriate 
constants, 

j mtr) > DPü(n-1—j)-t-u) T{}(n-j)+u} 
E Ores ees mt) cla. PALOS 8. 1.91 
Пее) = Тук Püeri-hrw) Tüe-0) E 
We recall that | c|/] c | = | d|/| m|. Put = —u in (41.91). We then have 

л jp TR+- T {nju 

E{|d|/|m|}* =I - ^ r 
ПАИ т 7 5, гатла) ги) 


TET Gn) +} 
7 Tou) Tn 9) iiti 
_ Bün-p)*-w35) (41.93) 


| Bün-p.dn) ` 
1 
= nis 
В(Мп—р),1р} 
which is uniquely determined by its moments. "This then is the distribution of the 
ratio | d|/| m |, and on substitution for T from (41.82) we arrive at (41.86). 


This is the uth moment of 


dF (n—P—2) (1 — у)Ф—) dx, (41.94) 


„It will be seen that the essential feature of the Т? statistic is that it is of form 
z'V-1 z, where z is a multinormal vector with means zero and dispersion matrix V, 
while V is the ML estimator of V, adjusted to be unbiassed. Above, we had z = x, 
V = d/(n(n—1). Similarly, we may define a test statistic for the two-sample case 
(generalizing “ Student's " two-sample 2° test) with z = (%,—¥:), v= 2+2), 

1 2, 
where Р is the unbiassed estimator of ће common dispersion matrix of the two popula- 
tions, calculated from the pooled samples. The condition corresponding to n—1>p 
in 41.16 is here m,+n,—2>p. For the contrary case, see Dempster (1958). 


41.18 So far we have generalized to p dimensions the normal distribution theory 
of means, dispersions, and the “ Student” ratio. It is natural to enquire whether 
there is a similar generalization of the Fisher test for the ratio of two independent 
variances. We defer a discussion of this topic to the next chapter, where it arises 
naturally in connection with tests of hypotheses. We may remark here, however, 
that exact distributions in closed form corresponding to Fisher's z, for example, are 
difficult to derive, but that moments can usually be obtained in the manner of 41.12. 
A further point of some interest is that, in generalizations of variance analysis, it is 
not the ratio of two independent dispersion determinants which arises for test, but the 
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ratio of type | c |/| ¢:+¢,| where сү and c, are independent. In the univariate case 
we can transform simply from sj/s? to s2/(s? -- s?), but this is no longer possible in two 
or more dimensions. 


Large-sample results 

41.19 Even for normal variation, where we are not concerned with cumulants of 
order higher than the second, there is an embarrassing profusion of parameters, p 
means, р variances, and 15(9— 1) covariances (or correlations), }p(p+3) in all. For 
p as low as 5 there are 20, and for p = 10 there are 65. The distributional problems 
are correspondingly intractable unless we assume away many of these parameters, 
e.g. by considering the case of independent variables, for which all correlations vanish. 

In such circumstances large-sample results are not to be despised, though they are 
often neglected. То the first order in n we have 

Elen) = Уе (41.95) 

To the same order 


КЫЛ А 
Elewe) = 1, EÍ а, È, Xp ы). (41.96) 


If о # В the two sums on the right are independent. There are n(n—1) such cases 
and the expectation is у. У. If « = В there are n terms such as E(xj, 34, Xia Xma). 
In general this involves fourth-order cumulants, but for normal variation we see from 


the c.f. of the x’s, 
ф = exp (“422 yatt), 


that 
E(X ju Xx Хы) = Vik Vim t Yim Vert Уя Ут (41.97) 
Substitution in (41.96) then gives us 


> 1 
E(cj Cm) = Yi Yim + 5 (Yim Vert Vin V em) 


and hence 
1 
COV (с.с) = Ом Via Và Ykm). (41.98) 
In particular, if j = k = [= m = 1 we have the known result 
2yh _ 20% 
var cu = УЗ = 1, (41.99) 


Example 41.6 
In our notation, it being indifferent whether we write parent or sample symbols, 


n= palit , 
(уп 722)! 
dra, dyu ldyu ldym (41.100) 


Tha Ma 2Yn уа’ 
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Likewise 
drs _ дуз _1 dyn 1 dyas 
= LHL 41.101 
"is Уз 2 уп 2Y»s ( ) 
Multiplying (41.100) and (41.101) and taking expectations, we have 
COV (712,713) _ COV (7/125 Ула) _1 соу (уп, yi) 1 cov (yi Улз) 
712713 Уз 713 2 уцуз 2 Yu?w 
_ 1 соу (yas Улз) _ 1 cov (Узв, Улз) + 1уагуц 
2 узуш 2 Ysy 4 yh 
+! cov (узь Узз) +1 соу (уз, Уз) 1 соу (ss, Узз) . (41.102) 
4 Yus 4 Yu?m 4 УзУзз 
Relations of type (41.98) reduce this to 
п COV (712,713) = ra (1 — ris — ri) + 37а ris (ris + is + 75 — 1). (41.103) 
The correlation of ту, and r, is easily determined from the consideration that 
nvarr = (1—r?)*. 


The latent roots of a dispersion matrix 

41.20 In later chapters we shall encounter several situations in which we are 
interested in the latent roots of a stochastic matrix. If (сд) is a dispersion matrix 
we shall wish to study the behaviour of the roots of the p-ic in A 

| 67495 | = |e—-AL| = 0, (41.104) 

where бу, is the Kronecker symbol, equal to zero unless j = k, in which case it is equal 
to unity. 

We take from matrix theory the result that if ¢ is a positive definite matrix the 
latent roots are all real and non-negative. Only exceptionally will the roots be equal, 
and if q of them are zero c is singular and has rank p—g. 


41.21 In point of fact (41.104) is a particular case of a rather more general form 

| e — 2b; | = 0, (41.105) 

where Б, с are independent dispersion matrices based on m, п observations respectively. 
We may write equivalently 


| (cg cba) - b. | = 0 (41.106) 
е 1 1-u 
with uel A= a (41.107) 


The complexity of the distributional problem arises from the fact that the roots in A 
or и are not algebraic functions of the dispersions. It is easier to derive sampling 
distributions of symmetric functions of the roots than of the individual roots themselves. 


41.22 We assume that the parent variation is normal with unit variances and zero 
covariances. The joint distribution of the су, and bj, then has the frequency function, 


as at (41.58), 
Г с epee |! b qe» 2) 


A Ги) (&m-j) 


fa- ane bic). 1.108) 
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In Chapter 43 we shall see that to each root u; of (41.106) there corresponds a 
latent vector 1, such that 


(b-u;(b-c)l; = 0 (41.109) 
and we may choose the 1, so that 
(b+c), = 1. (41.110) 
If 1 is the pxp matrix of latent vectors we have from (41.109) 
bl = (b+c)lu (41.111) 
where u is a diagonal matrix whose elements are иу, из, . . . , Uy. We now suppose 
that the ws are arranged in descending order of magnitude, и, > 4 >... > ц. 
We also have from (41.109) 
ЦЫ, = 4; (Ь+с)1, (41.112) 


and by transposition (b and с being symmetric) 
I, b = цш (b+c) 


so that 
I, bl, = ul; (b-4-c)L. (41.113) 
It follows from (41.112) and (41.113) that, if и, # и, 
1,(b+e)1, = 0. (41.114) 
Multiplying (41.111) on the left by I’ and using (41.110) we have 
ТЫ = u. (41.115) 
From (41.110) апа (41.114), 
1(+с(1= I (41.116) 
and hence 
b+c = (1)-11-1. (41.117) 
Likewise from (41.115) 
b = (I) ul" (41.118) 
and hence с = (L)-!(I-u)l-. (41.119) 


41.23 Looking back to (41.108), we see that with the transformation (41.118-19) 
the frequency function is given by 


(&a- x) ош DEM p-2) 


fe IT (n). mj) x a function of 1. (41.120) 


We now consider the Jacobian of the transformation. It will appear that the distribu- 
tion of u’s is independent of that of the Їз and the contribution from the latter may 
be factorized off. 
We observe from (41.118) and (41.119) that there are $p(p+1) variables in each of 
b and c, p(p+1) in all; and p? variables in 1 and pws, again making p(p+1). 
The Jacobian of b, c is the same as that of b, b+c. Writing g for 1- in (41.118) 
and (41.119), we then have to consider the Jacobian of the transformation 
b= gug (41.121) 
b+c = g'g. (41.122) 
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We prove that u,— t, is a factor of the Jacobian. Although the argument is general, 
it will be clearer if we write it explicitly for p = 2. 
The matrix b then becomes 


[5 Us. Ua o) Я (41.123) 
12811 My 2221 Ua Sint +832 Us, 


b+c is the same with the ws put equal to unity. 


For the Jacobian we have 
| їп Za Yua O жан. 0 | 


а Zaga Siar Sut Seale Zanta | 

{bin bin baas (0+0), (0+) (0 с)а) _ | Siz Sie O 25m 0 20229. 

(Uy, и», £i £1 Sar» £22) | 0 0 2gu 0 221 0 | 
0 0 2а f£ Яз £u 


| 0 0 0 22а 0 2| 


If u, = u, we can subtract multiples of the bottom three rows from the top three to 
obliterate all except the first two terms in these rows, and the determinant vanishes. 
It follows that every и;— и, ] > А, is a factor of the Jacobian. The product of these 
factors is of degree }p(p—1) in the ws. There can be no other terms involving и 
because the Jacobian can be of no higher degree. 

'Thus, for the v's alone we have from (41.120) 

_ p -u}= (Tu im- | _ 

E. EUG ART E E а ы N 

where k is some constant. To evaluate it by explicit integration would be very diffi- 
cult. The following indirect route suffices: 

k arises from terms in the original density and the Jacobian involving p and m+n, 
but not m separately. Write it then as k(p,m+n). Note that |b| = Пи. If we 
increase m by 2¢ in (41.124) and integrate we have the tth moment of | b | except for a 
factor k(p,m+n+2t). After the manner of 41.12 we can find this moment, and there 
results 


R(p,m+n+2t) _ р P(m*n-1—j)-0) (41.125) 
R(p,mt+n) уаз T((man-1-j)) ` 
It follows that 
k(p,m+n) = кодй Г{цт+л-1- jh (41.126) 
fu 
where K(p) is a function of p only. То evaluate it, make the substitution in (41.124) 
of u; = 2v;/n and let n tend to infinity. Our distribution becomes 
dF = (IIo;)i"-7-? exp (— X v;) II (v;— vy) I do; 
T((m*n-1—j)K(2). 
T (т) T 80-75 
'This may be evaluated by step-by-step substitutions of the type 
тузет 
б = w+, ј>1, 


(41.127) 
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and choosing т at each stage so that the coefficient in Пи, vanishes, as we may since 
the result is independent of m. We find 


K() g T1-j _ 
200-0 Гр) 
Use of the duplication formula (41.56) for the Gamma function gives из 


гү E e эң 
@ = пгт) 
Assembling all the factors, we find finally for the distribution 
P D(mn-1-j))uj"77-?(1—u-7-» 
dF = лі» А tn суыр imis) ae Ta = i 
Wj Tüm-D)TQG-DJTQ(Q41-5) 1060—90 Пан (41.129) 
a remarkable form discovered їп 1939 by Fisher, Р. L. Hsu, S. N. Roy, Girshick, and 
Mood, independently—cf. Mood (1951). 
The distribution of the 2’s is, of course, given by a simple substitution 


и = 1/(1+2). 


(41.128) 


41.24 In the case of (41.104), when the matrix b reduces to the identity matrix, а 
slightly different result is obtained. We will quote the result for the distribution of 
the roots A in this case. The reader may care to run through the foregoing proof 
and modify it where necessary to obtain this result: 


Gap ee (4 t a} 


Ра. А.з ® E E 
aF = 3-51 TEDI 1, 06-2014, — (113) 


where now the 4; are in descending order. 


Example 41.7 


These distributions are very intractable except in simple cases. Let us consider 
the case when p = 2. From (41.130) we have 


208 (AAO exp {-МА+4)}, _ 
dF = 291 TAa- fa) 3 (2, — 45) day dha. (41.131) 
The duplication formula (41.56) for Gamma functions reduces the frequency function to 
_ (А) “"—®#ехр (— 304-4) (a — 43) dA dha 
dF = Ste ЯП О). —— (41.132) 
If we try to integrate for/, over the range 0 to 2,, we obtain for 4; an Incomplete Gamma 
function. On the other hand, for the functions 
х= lifa y= +2 
we find 1/J = | 2,1—2 | and the distribution becomes 
09-9 е9 dy dy А 
dF = 4T(u-2) " 0<х<4у?, (41.133) 
and on integration for x, 
ағ = OIV e dlhy) 


пату > 06769. (41.134) 


MULTIVARIATE DISTRIBUTION THEORY 259 
In fact, in this case the determinant reduces to 
SÀ 51537 2 2 ain 
Bere не #2 —(s-s)a-sis$(-7?) = 0. (41.135) 
The sum of the roots is thus equal to the sum of two independent variables and 
has a 7? distribution with 2n—2 degrees of freedom. 


41.25 We shall discuss the large-sample theory of latent roots in Chapter 43. 


Non-central distributions 

41.26 Just as for univariate y?, t, and F (variance ratio), so there arise for study 
here non-central multivariate distributions, especially in the consideration of power- 
functions of tests based on T? or related statistics. As might be expected, the result- 
ing distributions are very cumbersome. We may note particularly 


(а) The non-central Wishart distribution, as to which see T. W. Anderson (1946) and 
subsequent papers and his book (1958); 

(b) Non-central 7°. Since Т? is distributed in the F form this is effectively a non- 
central F—see 42.22 below. 


41.27 In conclusion we may note some points which we shall have no space to 
develop in detail. 


(1) The distribution of latent roots (41.129) reduces for р = 1 to a Beta distribution. 
Foster and Rees (1957-8) therefore called it a “ generalized Beta distribution.” 
Following a method due to S. N. Roy (1945) and Pillai (1956), they tabulated 
percentiles of the largest root for р = 2, 3, 4 and 5. Pillai (1966, 1967a) has 
improved the method and tabulated (1964, 1965, 1967b) up to p = 7. Sugiyama 
(1967a, b) gives expressions for the exact distribution of the largest root in (41.130) 
and the largest and smallest roots in (41.129). 

(2) Wagle (1962) approached the distribution problem by sampling experiments on 
an electronic computer. The task is not a light one, but results for p = 2, 3, 4, 
for all latent roots, were successfully obtained, and calculations for higher values 
are only a matter of machine time. 

(3) The Indian school, starting with some work by Mahalanobis (1930) on racial like- 
ness, has developed some interesting work based on what is known as the D?- 
statistic. See, for example, R. C. Bose (1936), R. C. Bose and S. N. Roy (1938), 
and many later papers by S. N. Roy. The statistic may be defined as 


2 
Uim aj, (3,5 — Toj) Gu — Sas) (41.136) 


where two samples, x, and ху, are drawn from two p-variate populations and (а) 
is the inverse of the pooled dispersion matrix. 
'The corresponding parameter 


AP = X agg Hy) (t — Han) (41.137) 
is sometimes known as Mahalanobis' generalized distance. 
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In fact, D? is a simple function of Hotelling's Т? defined for the two-sample 
case as in 41.17 above, і.е. D? = (+2) 
m n 


(4) If cı, c, follow a Wishart distribution based on р variables with sample numbers 
n, and n, there exists a lower triangular matrix V such that c,--c, = VV’ 
(cf. Exercise 41.16). If L is defined by c, — VLV', hen V and L are inde- 
pendently distributed and L has frequency function 

f œ |L [e-2-9|I -L [i072-2, 
This result is originally due to Hsu (1939). See also Kshirsagar (1961). The 
distribution of L is sometimes called “ the multivariate Beta distribution.” 

(5) A summary of work on latent roots is given by A. T. James (1964). See also 
A. 'T. James (1966) and Pillai (1966). 


41.28 J. M. Chambers (1967) gives methods of asymptotic approximation for multi- 
variate distributions, including Edgeworth expansions (cf. 6.18, Vol. 1) and perturbation 
approximations which generalize the methods of 10.6. 


EXERCISES 


41.1 If X is the mean of a p-variate sample from a normal population with mean p and 
dispersion matrix ү show that 
n(£— po)’ Y^! (р) 
is distributed as non-central у? with p degrees of freedom and non-central parameter 
п(.— Ho)” Y^ (р) 
where py is а given vector. 
(R. С. Bose, 1936) 


41.2 X, and X, are the sample means from two p-variate normal populations with means 
Ш, H and common dispersion matrix y. If y = x,—x, and v = p,—p, show that 
"ms та 2 
— — (у-у -v 
аа ЖЕН СЫ ы 
is a confidence region for v, n, and n, being the respective sample numbers. 


41.3 In 41.4, show that X and the sample matrix a (= с!) are sufficient statistics for p 
and y~. 


41.4 Ina four-variate normal distribution show that the correlation between the covariances 
Cıa and cg, is 
PisPuc PaPa 
A Heia) 0-030)" 
(Wishart, 1929) 


41.5 A multivariate normal population has means р, all variances equal to c? and all cor- 
relations equal to p. Defining 


(n—1)ps? = У (x —x)*, 
j 


(n-1)(p-Dps*r = У У (ду) 39, 
[Г] 
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show that the joint frequency of 
и = s*{1+(p—-1)r} 
v=s*(1—r) 
is given by 
= —1(n—1) — и,(р—1)о 
ағ сс in- o7 007071 exp у п 1) (7775 — du dv 


where 
a 


а? {1+(р— 1)р}, 
0 = o (1-р). 
Hence show that u0/(vx) is distributed in the F form with n—1, (p—1)(n—1) d.fr., and 


derive confidence intervals for p. 
(Geisser, 1964) 


41.6 c is distributed in the Wishart form in samples from a population with dispersion 
matrix y. Show that h'ch is distributed with corresponding parameter h'yh, h being an 
arbitrary non-singular p хр matrix. 


41.7 If x is distributed N(0, y), i.e. normal with zero mean and dispersion matrix ү, and 
M is an orthogonal matrix, show that y = Mx is also distributed N(0, ү). In particular if M 
is the Helmert matrix of 41.6 show that c may be represented as 
п-1 
S yiyi 
1=1 


where the y’s are independent and distributed as N(0,y) Deduce the additive property of 
Wishart matrices of 41.10. 


41.8 With reference to Example 41.3 show that the frequency function of Лу |}, 
say y, is given approximately by 


where 


„енеш, 
(Hoel, 1937) 


41.9 Show that for а sample dispersion matrix c, n! (| c |/] y | ^ 1) is distributed about zero 
with variance 2p for large samples. 
(Т. W. Anderson, 1958) 


41.10 If a sample of л is chosen from a p-variate normal population, the variates being 
grouped into А classes xj, +++» Xp) pitts +++ 9 Xp ted ++} Xp test ee Pea tl, + ро con- 
sider the function 


where ra= 1 and "n is zero if the variates belong to different classes and equals the correlation 
ту if they belong to the same class. 
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Show that the LR test of the independence of the А sets of variates is 
1= Wi 
and show that 


k p n Р л 
nm = IH 4 T0020 | f 700-040 
Е гин ы-ы) 
(Wilks, 1935) 


41.11 As a particular case of the last exercise, show that if a single variate x, is independent 


of a second set x4, ... , хр, then 
шн) ОГ AeA) +7} 
T {}(n—1)+r}T (ир) 
and hence find the distribution of the multiple correlation coefficient when the parent coefficient 
13 zero. 


(Wilks, 1935) 


41.12 Show algebraically that Hotelling's Т? is invariant under linear transformations of 
the p variates. 


41.13 For a pair of normal variates with correlation p, show that, defining v by 
-— nO 
2,0, (1 — p°) 
we have for the frequency function of v 
— р?)і(п— 1 
$0) = ae re jj 9 1i 
for v» 0 and a similar expression with —v for v inside the curly brackets if v «0. Неге К is 


the Bessel function of second kind with imaginary argument. 
(Wishart and Bartlett, 1933) 


41.14 In equation (41.129) with р = 2 show that the distribution of y = (1—и,)(1—иь) 
is given by 
аузу A HEU 
B(m—p,n—p+1) 


dF = 


41.15 Verify equation (41.62). 


41.16 (Bartlett decomposition.) Let хук, j = 1, 2,..., p; Ё = 1, 2,..., n, be inde- 
pendent N(0, 1) variables. Take 
va = X, 
уз = х.- шу 
у» = хр Бру... —bp,p-1Yp-1 


and take the y's to be orthogonal so that ууу = 0, j #R. Then bj = yixj/ytyr. Take 
bj = (укук). Show that 


x'x = BB’ 
where B is the triangular matrix 
(уі у)? 0 deccm — 
bs HIS 0 oa 0 


by. [^ eo (yp yp 
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Show that the bj, k = 1, . . ., (p—1) are independent N(0, 1) variables and hence that each 
уу is a y? variable with n—k+1 degrees of freedom, independent of the 6’s and the other 


Yi 
(Bartlett, 1933. Cf. Kshirsagar, 1959) 


41.17 In the foregoing exercise, by taking determinants in the equation х'х = BB’, show 
that yiy; is the ratio of two product-sum determinants. Hence show that the kth diagonal 
element of the inverse of x'x is distributed as the reciprocal of a у? variable with n—k+1 
degrees of freedom. 

(Wijsman, 1957; see also Kshirsagar, 1959) 


41.18 Use the previous exercise to prove the result of 41.13, that | 4| /| v | is the product 
of p independent у? factors with degrees of freedom n—1, n—2,..., n—f. 
(Wijsman, 1957) 
41.19 Verify the result (41.130). 


41.20 From (41.53) show that | с | is a biassed estimator of | y |. Show also that the bias 
is not removed by dividing product-sums by n—1 instead of n to obtain covariances, 


СНАРТЕК 42 
TESTS OF HYPOTHESES IN MULTIVARIATE ANALYSIS 


42.1 The exact theory of multivariate analysis, in the present state of knowledge, 
is concerned almost entirely with normal variation, and we have seen in the previous 
chapter that ML estimators of means and dispersions are the corresponding sample 
statistics. A general theory of estimation, other than the ordinary maximum-likelihood 
method, has yet to be produced for practical use. Methods based on Bayesian prior 
probabilities or fiducial arguments have been discussed in the literature but lead to 
some rather anomalous results on occasion (see, for example, Dempster, 1966). We 
shall content ourselves with the normal ML estimators, which in any case have a certain 
plausibility. The only point to comment upon concerns bias. 

Consider, for example, the dispersion determinant | c| as an estimator of the 
parent | у |. From equation (41.53) with t = 1 we find 


Elei = 1 (1=2))y| (42.1) 
E ( 2 Е i 
To order n~? the multiplicative factor on the right is 
-l$;., 5*1 
1-50) = 1-A. (42.2) 
The bias may therefore be appreciable. We may, however, remove it, or at least reduce 
it to order n-*, either by multiplying | с | by the reciprocal of the right-hand side of 
(42.2) or by the device due to Quenouille (17.10, Vol. 2). If | c |, represents the dis- 
persion determinant based on л observations and | c |,_, the similar determinant based 
on л — 1, we construct the estimator 
Est | y| = n] el- (1) Av] ela (42.3) 
where the average on the right is taken over all the possible determinants obtained 
by dropping one observation. We then have, to order n-!, 
zeer S. a- {i _P(p+1) 
1- n—1 
Iyl O=) 2(n—1) 


1, 


and the estimator is unbiassed to order n-1. The idea is quite straightforward; the 
difficulty in applying it lies in the amount of calculation involved in computing all 
the dispersion determinants, though this is not insuperable for an electronic computer. 


42.2 For the remainder of this chapter, we shall discuss tests of hypotheses. 
А review of confidence regions, tolerance regions and prediction regions for the multi- 
normal distribution is given by Chew (1966). 
264 
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Homogeneity tests 

42.3 А natural generalization of the variance analysis considered in Chapter 35 
arises if we consider samples from k different p-variate populations and enquire whether 
the parents may be identical. There are, as usual, three types of hypothesis to 
consider: 
H: that the populations have the same means and dispersions, namely are identical; 
H,: that the populations have the same dispersions but may differ in the means; 
H,: it is known that the populations have the same dispersions; the hypothesis is that 

they have the same means. 


There are, of course, hybrid hypotheses, e.g. given certain dispersions but not 
others, that the others are equal. 


424 For testing simple hypotheses, the Neyman-Pearson lemma of 22.10 (Vol. 2) 
applies to multivariate distributions without change. Similarly the likelihood-ratio 
method of Chapter 24, with the same plausibility, may be adopted as a test statistic 
for composite hypotheses. 

One property of maximization procedures is worth noticing. If we are maximizing, 
say f(0,, 0.) for variations in 0; and 0,, we may solve the simultaneous equations 


Dien UE uy (424) 


It is, however, equivalent to solve @f/00, = 0 for 6,, substitute in f, and then solve 
df/d0, — 0. 


42.5 Consider, then, k multivariate normal populations with means typified by uj, 
(j=1,2,...,p;t= 1,2, ... , k) and dispersions by уу or equivalently оу, түру. 
Let there be a sample of n, from the tth population. If a, is inverse to yj, the 
likelihood function of all samples together is 


k | a, |" = р 
ML exp {-+ d S. 2. E) (42.5) 
If all р? and y’s are equal, the corresponding likelihood is 
in Р 
Gs ere (718, S. eeue} (42.6) 
k 
where n= У п. (42.7) 


m 
In accordance with the usual procedure (Chapter 24), we estimate the parameters 
in (42.5) by ML and substitute in it to obtain the unconditioned maximum Г. Like- 
wise for (42.6) to obtain the conditioned maximum L,. We then use the ratio [= 
L,/L,, or some monotonic function of it, as the test criterion. 
The logarithm of the likelihood (42.5) becomes the sum of k terms which, being 
independent, can be maximized separately. We find, as expected, 
йи = Sip (42.8) 
би = адь (42.9) 
Ўи = Gw (42.10) 
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Substitution in the exponential term in (42.5) yields a constant, for 


E, ацец = 1. (42.11) 
Thus, except for a constant, 4 
c 1 
Ша = . 
+ t=1| Gu [n gen 
Likewise from (42.6) we obtain 
1 


(42.13) 


d IM 
where су is the dispersion for all А samples pooled together. Тһе test criterion is 
then given by 
L, _ П ey | 
L, | eu 

Ж. i ( eu П" 
- LER. (42.14) 
SU 
As in the univariate case, / may vary from 0 to 1. The nearer to unity, the more 
we are inclined to accept the hypothesis that all means and all dispersions are equal. 


la = 


42.6 The same technique gives us tests for H, and H,. We quote the results 
without proof. 
Let су, be the average dispersion taken over the k populations, namely 


Jim К, h 
cg ES n Deu 80. (42.15) 
№ П 
TENIS Ig, = П 16и i} ps (42.16) 
t=1 [сла | 
| cus I| ^ 
For H,, lg, =4 7 : (42.17) 
[PT 
We note that, as in the univariate case (cf. Exercise 24.6), 
ОЕШ (42.18) 


Our test criteria thus appear as the ratios of dispersion determinants. 


42.7 "То apply the tests we require the distributions of the criteria. In a few 
cases they can be obtained explicitly. In all cases we can obtain moments after the 
manner of 41.12. For practical purposes, however, it is enough to rely on an approxi- 
mation due to Wilks, to the effect that —2 log / is distributed as y? with d.fr. equal 
to the number of constraints imposed by the hypothesis, i.e. the number of parameters 
under estimate in the unconditional form minus the number in the conditional form. 
'The proof is a simple extension of that in 24.7 (Vol. 2). 

We have left the criteria / in the form in which they naturally arise. Clearly any 
power of / would serve our purpose equally well. In particular, we might use the 
(2/n)th power, in which case the criterion for H, in (42.17) becomes the ratio of deter- 
minants and it is —7 times the logarithm of this ratio which is distributed as 7?. 
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42.8 Consider now the moments of the LR criterion (42.14) for testing Н. We 
have 


1 ee ee 

сп = з S (хи—5)(хи—®) 
GUNG rene SEU 
niim jt jt и и, hint К^ j: и L 


= Gat Сут» (42.19) 
where су is the dispersion of sample means about pooled means. Following the 
device used in 41.17 we can write the likelihood of dispersions in two ways, one involving 
|c| and the other involving | cj, | and | сп |. We then find 

| tome v TB (1 7-5) 
E(t, -H[(2) ПВ) 
07m) м Ги) 
п ГЧ(@т-/)} ] 
О, ыл т |. 42.20 
THEU- a 
This and the following results are due to Wilks (1932), to whom reference may be made 
for details. 
In a similar way it may be shown that 


Еа) = h (=) f Teu +) 21 


LM T (nj) 
T(n—k1-j) 
Ё 31ГЕ (1+) +18] (42.21) 
ae TR (1 +r)-k+1-j}] Г) 
Е) jd Tà(n-k-1-j) Г (+r) (42.22) 


Note that as in Exercise 24.6 the moments of / are the product of the moments of 
the other two /’s. This implies that ln, and їн, are independent when Н holds, which 


is what we might expect from the independence of means and dispersions. 


42.9 In passing we may remark on a possible source of confusion. In our nota- 
tion л is the sample number, not the degrees of freedom. The form of the frequency 
distribution as written, for example, at (42.5) contains the n’s only in the preliminary 
constants. If the exponent were reducible to a quadratic form which transformed to 
a sum of p—q squares the appropriate preliminary constants would have р — 4 instead 
of p. And if the sum over sample values were equivalent to n—q instead of л values 
we should have 1— q instead of л in the constants. Whether this affects the exponents 
in (42.5) and (42.6) depends on how we define the dispersions. In our usage the 
divisor is always л. Some writers use >, = n,—1 instead of our л, and у = n—k 
instead of our л, in defining dispersions. The reason for so doing is the one noticed 
in Example 24.6, Vol. 2—the test is nearer to being unbiassed. 


42.10 For р = 1 the distributions reduce, of course, to those already familiar 
in univariate theory (cf. Exercises 24.4-6). The reader may care to verify as an exercise 
that this is so. 
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For р = 2 we find from (42.22) for the moments of ly, 
pt = Flirt +) Eo T (n 0) TH +7) —k—- DIT A 2) (42.23) 
" Fü(-Ejrpsu47)-1] P(»-k-0)rTBeaQu470-2]^ ^7 
Use of the duplication formula (41.56) for the Gamma function reduces this to 
, _ Г{(1+)—Ё—1}Г(л—2) 
K = так DTA- 


= B{n(1+r)—k-1, k-1)/B(k—1, n—k—1). (42.24) 
The moments of (Ijj.)"^, namely (| си, |/] ¢;;|}#, are then those of 
ПЕШИЦЕ Ыз шз! EE E 
dF = BEI, erat утаа dx. (42.25) 


If p is even, we can use the duplication formula for the Gamma function to reduce 
the moments of the /-criteria to products of Beta functions, and the criteria are revealed 
as the product of certain independent Type I variables. "This permits the expression 
of the distribution functions in closed form when either р or the number of constraints 
imposed by the hypothesis is even—cf. Schatzoff (1966b) who gives corrective tables 
to the asymptotic у? approximations. 


42.11 The most useful results for testing hypotheses in practice are asymptotic 
expressions. Following the treatment by Box (1949), we shall develop a general 
method along the lines of Example 41.4 in the previous chapter. The method, in 
fact, is applicable to a wide range of criteria depending on likelihood ratios. 

Consider a variable W with moments 


k t m 
II y; E Г{х,(1+)+&} 


E(W") = constant. | #=* x (42.26) 
П xj} IL rint 
а ачан 
m E 
where У y= Ey (42.27) 


3-1 jei 
In our treatment х; and у; will be large, of the order of л, the total sample number, 
and we may write O(n) indifferently for O(x) or O(y). 

Now take 

M = -2logW (42.28) 
and let us find the characteristic function of pM, where p lies between 0 and 1 and is 
а scaling constant which we may later choose at convenience. Taking t as the dummy 
variable in the c.f., we have for pM 


#0) = Elexp itpM) = ЕИ) 
П Ои)" f red-ze 
Hen] ro,0-222) 


Putting now (1-0) = By (1-9) = s (42.30) 


= constant. (42.29) 
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we have for the cumulant-generating function of pM 


wt) = 20) (0) 


where 
т k m 
a(t) = 2pit [£ slog зу— ® y; log x] + tog Г s (022) 494) 


x 
- РД log Г {ру;(1—2й)+2;+1). (42.31) 
j= 
We now use the expansion of the Gamma function (valid for complex values) 


: ; By (0 
log T (x+h) = log 4/(22)-- (x-- h— 3) log к=к (- ту 080 D Russ (x) (4232) 
where the B's are Bernoulli polynomials of order unity (3.25, Vol. 1). We then find, 
on expanding (42.31) and the corresponding 2(0), 


WO = —їЛев 0-22) + È o, (1-280) 1) (42.33) 
where 
т k 
a E 4-3 tj73-8)) (42.34) 
к, (-1y* [$ В,ы(В;+&)_ H Bj (eit n) 
CI © ig cep е RR 


We must remember that, from (42.30), Ё and ғ are of order л unless 1—p is small. 
For р = 1 we have œw, = O(n-!?). Thus we have 
y(t) = —1flog (1—2it)+ O(n-?), (42.36) 
and hence, to this order, —2 log W is distributed as 7? with f degrees of freedom. 
Taking the approximation one stage further, we find, since 
B,(x) = x*-x+4, 
pem PÈ (асн аз i) 5 (pim Gem 
1 
2p | i-1 X; i-i Yi 
which, by use of (42.30), reduces to 
ANI = H-ht4 _ è т 
==; [-« -fA Adr | (42.37) 
If we now take p such that œ, = 0 we have 
w(t) = – 17106 (1 —2it) + O(n-*). (42.38) 
In general then there exists a constant p such that —2p log W is distributed as у? with 
f degrees of freedom to order n-?. 
Box (1949) has pushed the investigation a good deal further, but, as we have re- 
marked, the cruder approximation (42.36) is usually good enough for practical purposes. 
See also Lawley (1956b), whose work was summarized in 24.9, Vol. 2. 


Example 42.1 
Let us find the у? approximations to the distribution of Jz, the moments of which 
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are given by (42.20). Comparison with (42.26) shows that they are of the required 
form with 

k=p, у= №, m4 = —1]; ј = 1, 2,..., р. 

т= рк, ху = Im, = 1, 2,..., k; & = —1]. 
Our first approximation is that —2 log / is distributed аз у? with degrees of freedom 
given by (42.34), namely 

f= —2{(—4) kph + 1)—(—2) kph + 1)—A(pk—-p)} 

= Uk—-1)p(p+3). (42.39) 
This is, in fact, the number of parameters in the likelihood (42.5) less the number 
in (42.6), i.e., the number of constraints imposed by the hypothesis. 

For a second approximation we find from (42.37), œ, = 0, 


0 = -@-#М#-1)др+з)+ә(® 1 ree 


24-( 5 1_1) 2*+9р+12 
pet (Èa JE EVE ee 

In expressions of this kind, it should be remembered that, in our convention, 

n; and п are sample numbers. Аз we noted above, results are sometimes quoted in 
the literature for criteria based on the degrees of freedom v; = лу—1 and v = п- А. 
This does not affect (42.39) but makes a difference to the second term in (42.40). 
In this case £j is }(1—j) and n; = }(k—j) and the corresponding expression to (42.40) 


1s 
=1- (2 1_1)_2*+3p-1 A") 
Pc (&z Jut» pss: j) Чо 


Tests of independence 

42.12 The set of tests which we proceed to develop are, with few exceptions, all 
based on the foregoing ideas: the deduction of a likelihood criterion, the ascertainment 
of its moments, and the approximation to a у? test or something a little more refined, 
We need not spend too much time on the derivation of the details, which may be left 
for verification to the student. 

First of all we consider a test of independence. Given, as usual, a sample of n 
from а p-variate normal population, and given a division of the variables into g sub- 
sets containing фу, Pa . . . , Pg variables, it is required to test the hypothesis that each 
subset is independent of the others. We shall be particularly interested in the cases 


q — 2 and g=p. 
If the parent dispersion matrix y is partitioned into g? components 
Yat, Yas. fes cols 
Yn. Yann Spot eg (42.42) 
Ya Ya +++ Yao 


the hypothesis under test is that 
Yn = 0, ј + А. (42.43) 
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We find for the likelihood ratio criterion, the alternative against (42.43) being (42.42), 


in 
lg = IE. (42.44) 
AL | eg [n 
j=1 
where as usual (сл) is the sample dispersion matrix. We can write equivalently 
np = ML. (42.45) 
П 
А [1 


Under the hypothesis, / is independent of its denominator and | су; | is independent of 
[сь |. We then find from (41.53) 


fi rai t геи) 


Eyed E À (42.46) 
fi rae» fi fi rga- 
—2 log is distributed approximately as у? with 
fj HAH) È py) (4247) 
d.fr. For the more accurate approximation we have 
раар), (42.48) 


6n(p*— pj) 
In the case ру = 1, all j, when we are testing the independence of all р variables, 
f= 12-1) (42.49) 
р = 1—{(2p+11)/6n}. (42.50) 
The criterion in this case is the 1nth power of | c | divided by the product of the diagonal 
elements, namely the variances; or, equivalently, the 4th power of the correlation 
determinant. 


Consul (1967a) obtains the exact distribution of (42.45) for д = 2, p,<6 and all р; 
and for many cases with g = 3, extending work by Wilks (1935)—see also T. W. 
Anderson (1958). Daly (1940) and Narain (1950) showed that this LR test of indepen- 
dence is unbiassed. 


Sphericity test 

42.13 We next consider a test whether an observed dispersion matrix с can have 
arisen from a population with a matrix proportional to a given matrix y. Since y 
is known, we can transform it by a linear transformation to the identity matrix. Writing 
€ for the corresponding transform of the observed matrix, we then have to test whether 
с has arisen from о? I where o? is unknown. This is described, for obvious reasons, 
as a sphericity test (Mauchly, 1940). We now find for the criterion 


E (es a а (42.51) 
fme] 


272 THE ADVANCED THEORY OF STATISTICS 


The moments of /*/" are given by 
һу = Tüpn-1) f F'üQ-)*5 
EEN = Prüpn-r)ep a. ТОИ) ins 


and as usual —7 log [?/" is distributed as у? with 


f = 40(p+1)-1. (42.53) 
For the second approximation 
2p*+pt+2 
= 1-4_+_. 42.54 
Р 6р1) (42.54) 


Consul (1967b) obtains the exact distribution of (42.51) for p = 2,3, 4and 6. Gleser 
(1966) shows that the sphericity test is unbiassed. 


One may similarly test that Y = Yo, a specified matrix, The distribution of this LR 
statistic is discussed by Korin (1968). 


42.14 The homogeneity tests can be used to generalize to p dimensions the tests 
of univariate AV. The role of variances is now taken over by generalized variances, 
i.e., dispersion determinants. 


Example 42.2 

We consider a two-dimensional case (p = 2), following Pearson and Wilks (1933). 
Five samples are available, each of twelve members, of aluminium die-castings (k — 5, 
7, = 12 for all t, п = 60). On each of the 60 specimens two measurements are taken, 
tensile strength (in 1000 Ib per sq. inch) which we call x, and hardness (Rockwell E) 
which we call y. The data may be summarized as follows: 


Sample number | 2 y s 
| mean s.d. mean s.d. Correlation 

1 |> 33:399. 2:565 | 6849 10:19 0-683 

2 | 28-216 4318 68-02 1449 0-876 

3 30:313 2-188 66-57 10-17 0-714 

4 | 33-150 3-964 76-12 11-18 0-715 

5 34-269 2:715 | 69-92 9-88 | 0-805 


We test first of all the hypothesis H, that dispersions are homogeneous. “We have 
the following results: 


| Sums of squares about | Sum of Generstised logi, of 
t | means | products "E generalized 
x y about means deter inan) variances 
1 78:948 124718 | 21418 365-204 2:56254 
2 223-695 2519-31 | 657-62 910-401 2:95923 
3 57-448 1241-78 | 190-63 243-029 2:38566 
4 187-618 1473-44 375-91 938-451 2-92741 
5 88-456 1171-73 259-18 | 253-281 2-40360 
"Torars 636-165 7653-44 | 1697-52 | 13:28844 
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For the pooled variances and covariances about respective means we have 
спа = 636-165/60 = 10-6028 


ca = 127-5573 
Сз. = 28-2920 
| суа | = 552-018. 
The criterion is then, from (42.16) in the form [*/", given by 


2 tog = 1 È flog | cull cual} 
eB Gace B | Gu jla 

= 1:914,73, 
with logs taken to base 10, giving Ё/" = 0:8217. For a test we find —n log, 2/" = 11:78. 
The number of degrees of freedom is 3(k—1), namely 12. The observed value is con- 
sistent with homogeneity, and we can now proceed to hypothesis Н», that the means 
are equal given equality of dispersions. For this, we require to apply (42.17) and hence 
to find the pooled variance about pooled means. he data are as follows: 


; B : 
Source | d.fr. | SS(9 | SS (у) | SP (xy) 
Between samples | k—1-2 4 | 306089 | 662-77 | 214-86 
Within samples | n—k=55 636165 | 7653-44 1697-52 
‘Tora. n-1=59 942:254 | 8316-21 | 1912:38 


The pooled dispersion determinant is then 1160-77 and the criterion is given by 
— 60 log (552:018/1160:77) = 44-59. 
The number of degrees of freedom is 2(k—1) = 8. The result rejects Н, at extremely 
small test sizes. 
We conclude that there is heterogeneity in the means. We now test x and y 
separately. 


Estimates of variance 
| x y | d.fr. 
Between samples 76-522 165-69 | 4 
Within samples 11-566 13915 | 55 


An ordinary F-test shows that at the 1 per cent point the differences between tensile 
strength, but not the differences between hardness, are responsible for the heterogeneity. 


Multivariate regression 
4245 We now suppose that our variables x are linearly related to a set of 2’s 
which may be regarded as fixed, by 
x-B rzre. (42.55) 


pxn рх {хп рхп 
B isa pxq matrix of coefficients, and e is a pm matrix of errors. If its sub- 
vectors €;, €, . . . , € Were independent we should, of course, have a set of p independent 
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regressions, one for each vector x. We shall not assume that they are independent 
By taking one variable z, as a dummy variable with value unity we may allow the 
component f; 2; to represent means, and hence assume all the elements of є to have 
a zero mean (cf. Exercise 19.1). 

Our object is to estimate B and the dispersion matrix of є which we shall write as 
c. We write a =o. 

(42.55) is, in different notation, the p-variate generalization of the general linea. 
model of Chapters 19, 24, and subsequently. Here, however, we assume normality 
from the outset. 


42.16 As in the univariate case we estimate B by maximizing the likelihood, which 
is given by 
" | a |” n Я 
н) (42.56) 
where the suffix ¢ is a sample label. 
For the ML estimator of fj, we find 


HALE i sts (su- Ё „ы)}=%. (42.57) 

Pir t=1s=1 m=1 
Writing te = S зихи = х, (42.58) 
va = Stet = 2yh (42.59) 


we find from (42.57) 
Уо, («- È б) =0 
LI mel 
and as @ is non-singular this gives us 
ua- È Во = 0 (42.60) 
mel 


which we may also write as 
В = u v2. (42.61) 


px pxqaqxq 
For the estimator of @ we have 


alog L 4 An 


bin dA in S (xj—E Ваз), Bin) = 0, ^ — (4262 
дау, lal n 
where A,, is the cofactor of о in| «|. Bearing in mind that æ is inverse to с we find 
1 
6. = aS- B Bu) У Pim Zm) (42.63) 
which we may also write 
T la- Êz) (к bz. (42.64) 


42.17 Since the z's are fixed we find from (42.61) 
E(B) = {E(u} v~ = E(x)z v- 
= pBzz'v-! = В. (42.65) 
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Writing temporarily V for the inverse of v we have from (42.61) 
Bue = > Uim Ve (42.66) 
From (42.55), multiplying by хџ and summing over t, then by Vy, and summing over 
1, we find 
Bir = i И ль S z ЕШ? (42.67) 
Непсе Bu Вљ = 5 > eiu Vue (42.68) 


Remembering that ej, and epų are independent unless t = и we find, with appropriate 
dummy suffixes for the summations, 


Е(бук— By) (Ёа Pim) = ES 2 пи Va) (5 = £u Vis) 
= 0 5 > » 23 2,4 Viz V m 
F- nÈ EH о. V um Vi 
- nÈ Orm Vix 


= ag Voy = 05 Viw (42.69) 
There are pq quantities В. The estimators 6 are distributed about mean @ with dis- 
persions given by (42.69). We may write equivalently 
E(8;—B,)' (Bi- B) = олут". (42.70) 
As the {7з are linear functions of x they are jointly normally distributed. 
By putting p = 1 and transposing all matrices in (42.55), we return to the LS theory 
of Chapter 19, as the reader should verify for (42.61) and (42.70). 


42.18 We may write 
S(x,-X буз) Ga Вел) 
= S(x; zA (ДЕ) (n-X Вла) +5 2 (By Вл) 21 Dm — Bo 
the cross-products vanishing, 
= S(s-2 Bus). Pate) FEE (BBs) m esr (42:71) 


"This is analogous to the univariate splitting of the sum of squares of errors into sum 
of squares of residuals (deviations from the estimated regression line) and a term due 
to the deviation of the estimated from the true parameter values—cf. (19.42), Vol. 2. 
It may be shown, by the argument used in reaching (42.69), that the last term on the 
right in (42.71) has an expectation of с. Тһе first term on the right in (42.71) is 
a quadratic form in the x's, which are multivariate normal. We do not, however, test 
one against the other, as in the univariate case, but the former against the whole. 


42.19 То do this we require a theorem to the effect that the estimated dispersion 
ё of (42.64) is distributed іп Wishart's form with n—q instead of л. From (42.64), 
(42.59) and (42.60) we find the equivalent form 


& = 1 г - Bv). (42.72) 


276 THE ADVANCED THEORY OF STATISTICS 


Our argument pursues the same line as the one used in 27.22, Vol. 2, in which we 
showed that, for normal variation, partial correlations have the same distribution as 
ordinary correlations, with lower degrees of freedom. Consider, in fact, a space of 
n dimensions. The (9 хп) matrix z determines a set of д vectors, say OP;,,..., ОР 
in this space, themselves lying in a g-space. 
Now 
€ = x- fz-(8—8)z 


and the two parts on the right are orthogonal in the m-space. For 


Š u- 2 бизи) = (В Bim) mt = 5 (иһ— лом) (Um — бум) 


and the first bracket vanishes in virtue of (42.60). 

Thus the vectors х— Вх are orthogonal to the g-space. Our original є vectors 
are represented as the sum of two parts, one lying in the g-space and the other orthogonal 
toit. In our -space, orthogonality implies zero correlation which implies independence 
for normal variables. Thus the two parts on the right in (42.71) are independent. 

It follows that the system represented by x—z has a Wishart distribution of 
dispersions, but with n—g instead of л, the variation being orthogonal to a space of q 
dimensions. 


42.00 We may now consider the testing of a hypothesis concerning regressions. 
Usually we require to know whether any of the з contribute significantly to the varia- 
tion of х; or equivalently, if we “ extract ” from x the variation due to a certain sub- 
set of (8z)’s, by the usual covariance technique, are the residuals significantly dependent 
on the remaining (fz)'s? 

Suppose then that we take q f's and test the hypothesis that a subset of m<q are 
zero. On the hypothesis that they are not, we estimate the д з and с and substitute 
in the likelihood (42.56). Now if we multiply (42.62) by &j, and sum over j, k, we 
find that the exponent in the likelihood reduces to a constant. Thus the likelihood 
of (42.56), apart from constants, reduces to | & |. On the tested hypothesis with 
т B's equal to zero, we find likewise that the likelihood reduces to some numerical 
multiple of, say, | ĉn |^ where œp is the inverse of the estimated dispersion matrix 
of (42.64), or equivalently of (42.72) with only q—7 fs under estimate. ‘Thus the 
likelihood ratio is 


PPA pde (42.73) 


(42.73) is distributed as the ratio of two Wishart determinants based on n—g and 
n—(q—m) sample numbers. Moreover, by the same sort of argument as was used 
in 42.19, the vectors corresponding to the g—m f’s supposed not to vanish are ortho- 
gonal in the 4-ѕрасе to the m fs which are supposed to vanish, and hence the functions 
contributing to ôm» are an independent subset of those entering into б. Thus the 
criterion of (42.73) may be tested in the manner of the / criterion considered earlier in 
the chapter. The following example will illustrate the method. 
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Example 42.3 (Data from М. M. Barnard, 1935; Bartlett, 1947) 


Miss Barnard had four series of Egyptian skulls, 91 Predynastic, 162 from the 
sixth to twelfth dynasties, 70 from the twelfth and thirteenth dynasties, and 75 from 


Ptolemaic dynasties—398 in all. 


millimetres): 
x, = maximum breadth 
x = basi-alveolar length 
x, = nasal height 
x, = basi-bregmatic height 


The means of the series as given by Barnard are 


277 


On each skull four measurements were taken (in 


| SeriesI 
n =9 
* 133-582,418 | 
Xa 98-307,692 | 
х 50°835,165 
* 133-000,000 


The sums and squ 


are 
ху 
X, 9661-997,440 
Ys 
Xs 
Xa 


The similar sums 


Series III 


Series IV 


134-882,716 


133-642,857 


Series II 
ng = 162 пз = 70 т = 75 
134-265,432 | 134-371,429 135-306,664 
96-462,963 95:857,143 95-040,000 
51-148,148 50-100,000 52-093,333 
| 131:466,667 


Xs 


445-573,301 
| 9073-115,207 


ares of products within series (which have 3 


1130-623,900 
1239-221,990 
3938-320,351 


for all observations together (397 d.fr.) are 


s 


2148-584,219 
2255-812,722 
1271:051,662 
| 8741-508,829 


х Xs Xs ™ 
= = == ae idees - 

x, | 9785-178,098 | 214-197,666 | 1217-929,248 | 2019-820,216 

а | 9559-460,890 | 1131-716,372 | 2381-126,040 

& | | 4088-731,856 | 1133-473,898 

*« | | 9382-242,720 

ciri РРР ТЕЕ ШШШ 
Finally, the sums of squares between classes (3 d.fr.) are 
| E Xs | Xs | X, 
el i > һы » 

X | —128:763,994 | —231:375,635 | 87-305,348 | —128-763,994 

Xa 125-313,318 486-345,863 | —107-505,618 125:313,318 

* | —137:580,764 100-411,505 | —137:580,764 

X, 640-733,891 


640-733,891 | 


С (42.74) 
94 degrees of freedom) 


(42.75) 


(42.76) 


(42.77) 
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We may first of all consider whether the data are homogeneous, in particular whether 
there are any significant differences between means of series. The appropriate criterion 
is the ratio /, of (42.17), namely the ratio of the determinants of (42.75) and (42.76) 
which is 
.. 2426:898 
С 2954474 
—n log I7", taking n as the number of degrees of freedom 397, is then 77:3 and the 
number of d.fr. for the approximate у? test is 3 х4 = 12. Thus we conclude that the 
data are not homogeneous even in the mean. 

There are several questions we may wish to ask at this point. For example, from 
(42.76) it is clear that within classes there is a considerable correlation between the 
variables. Are the differences between the means attributable to influences from all 
four variables, or, for example, do x, and x, contribute to the differences only because 
of their correlation with x, and x,? ‘To answer this we determine the regressions of 
x, and x, on x, and x, extract them from the total variation, and test the residual 
matrices. Thus we regard x, and x, as a matrix of dependent variables (the x’s of 
42.16) and ху, x, as 2’s. 

In our present case, the dispersion matrix of xy, x, from (42.75) is 


Bin = 0:8214. 


1 [9661977040 445:573,301 
E | (42.78) 
и 9073-115,207 
the inverse of which is 
1-037,332  —0-050,942 
394 10-8 : (42.79) 
1-104,659 


The variation due to regression of x on z is, from (42.72), 
буб’ = (uv-) (v) (v) u' = uv-!u'. 
In our case x refers to x, and x4, z to x, and x, so we find for this expression 


коше ен | 1-037,332 pee! 
10-4 


—0-050,942 1-104,659 
рех E 


1239-221,990 2255-812,722 


2148:584,210 2255:812,722, 


287-967,620 534-238,796 
= : (42.80) 
534238,796 991-621,041 
Subtracting this from the matrix of x, and x,, we have as residual 
3650-353,731 736-815,866 
A (42.81) 
7749-887,788. 


with 394—2 = 392 d.fr. 
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Similarly, operating on (42.76) for the totals of product sums we find the residual 
persone код 


8393-755,848. 
The question is whether the matrices (42.81) and (42.82) are significantly different. 
We can regard the latter as the residual in the regression of ху, x, on ху, x, plus a vector 
representing the mean; the former has had the mean abstracted in each class. The 
ratio (42.73) of their determinants is 


(42.82) 


0-316,003 
—n log Ё/", with n = 392, is then 51:39 and the appropriate number of degrees of freedom 
is 3x2 = 6. Homogeneity is therefore rejected. We conclude that x, and x, are 
relevant variables in the sense that the differences between means cannot be ascribed to 
x, and x, alone. 

A further question considered by Miss Barnard was whether these variables might 
each have a linear regression on time. To investigate this we require a time variable, 
and the intervals between the four series were taken proportionately to 2, 1, 2. We 
may therefore conveniently take the values of t as —5, —1, 1, 5. On this basis 

S(t—-i) =  4307-663,32 


Sx,(t-i) = 781:762,86 
$х,(@—ї) = —1407260,75 
Sx,(t—7) = —410-101,94 
Sx,(t-2) = —733-427,58 


We are now examining the regression of each of the x’s on the extraneous variable 
time. The sums of squares and products due to regression (1 degree of freedom) are 


A | a а ЕЯ 
х 119-930,358 | —234-810,812 68-428,625 | —122-377,258 
Xs 459-734,449 | —133:975,163 | —149-601,596 
X, | 39-042,852 | —69-824,358 
X, 124-874,099 
E SEX (42.83) 
Here, for example, the item in row 1 and column 2 is 
Sx, (t—#) Sx,(t—1) _ (718:762,86) (—1407.260,75) _ —234-810,818. 


S(t-2)? 4307-668,72 
The residual after removing the regression on time from the original matrix is given 
by subtracting (42.83) from (42.76), namely 


^ Ys *s Xa 


х 9665-247,740 449-008,478 | 1149-501,013 | 2142-197,474 


X. 9099-726,441 | 1265-691,535 | 2231-524,444 
Xs 4049-689,004 | 1203-298,256 
X, 2957-368,621 


with 396 d.fr. (42.84) 


T 
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We now test whether this residual is homogeneous, taking the variation within 
series as given by (42.79) against (42.84). The ratio of determinants is /2/" = 0-9031. 
—n log I?/^ is 40 with 2x4 = 8 d.fr. We reject the hypothesis of homogeneity. 

We conclude that if regression on time is linear, there are differences between the 
series which are not due to temporal effects. 


Schatzoff (1966b) and Consul (1966) give the exact distribution of /, in closed form 
when р or m is even and when р<4 respectively. 


42.21 The family of likelihood criteria which we have considered so far are ratios 
of determinants of dispersions of one kind or another and, algebraically speaking, are 
relatively simple. Other statistics which might be useful in testing are the value or 
values of latent roots of dispersion matrices. For example, the equality of two dis- 
persion matrices A and B depends on the nearness to unity of the roots of | A—/B| = 0. 
Іп а sense, then, tests of ratios of type | А |/| В | can be carried out on latent roots. 
However, as we noted in 41.27, the exact distributions are not yet well tabulated, 
and although we may derive large-sample approximations, they are not so good as those 
for the likelihood criteria, which can be carried to any desired degree of accuracy by 
the methods of 42.11. 

In fact, some of our likelihood ratio criteria are symmetric functions of the latent 
roots. For example, in | A—-2B | = 0 the product of all р roots is | A |/| В|, as is 
easily seen by writing A = AB-!B. Cases where we are more interested in individual 
roots occur in the next chapter. 


Example 42.4 (Foster and Rees, 1957, Data from Ashton, Lipton and Healy, 1957) 


Data are given for two measurements (р = 2) on three groups of males: human, 
chimpanzee, and orang-utan. ‘The measurements were on tooth length (x,) and breadth 
(ха) for the permanent upper second premolar, and were transformed to logarithms to 
stabilize variance. 

The sums of products were as follows: 


Sums of products 


fe x 13 x 
Between groups 2 0-544,941 0-525,765 0-509,075 
Within groups 154 0-137,786 0-069,342 0-092,792 
Тота, 156 0:682,727 0:595,107 0:601,867 


— (42.85) 
To test homogeneity we may consider the roots of 
|A-2B| = 0 
where A is the matrix between groups and B the total. The resulting equation is 
a quadratic with roots 
Az = 0-020,238, 2, = 0-856,543. 

We take the greater root as our criterion. The larger it is the more we suspect the hypo- 
thesis of homogeneity. From the Foster-Rees table (cf. 41.27(1)) the 99 per cent. point 
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of its distribution is found to be about 0-08. The observed value is highly in excess 
of this. 
The group means were 


| No. іп Fi E 

group | Xi | Ед 
Нитап s9 | 1846 | 198 
Chimpanzee 55 1:865 2-008 
Orang-utan 43 | 19% 2419 


Had we wanted to test the hypothesis that А and B are equal, without knowing 
which is larger, we should have had to test the smallness of the lesser root as well. 
In the general case this presents a theoretical difficulty, since the joint distribution of 
smallest and largest roots is not known. However, as might be expected, they tend to 
independence for large p. 


Power of the tests 


42.22 We have already remarked on the embarrassing profusion of parameters 
appearing in a multivariate situation. Our test criteria do not contain them, but when 
we wish to specify alternatives in order to ascertain the power of a test we are in a position 
of some complexity. 

For a test of a mean vector based on Т? (cf. 41.16) the power can be ascertained from 
existing tables. In fact, if the parent vector is p, the distribution of 7'* based on another 
vector |, namely 

Т? = n-p) с-1(&— po), (42.86) 
has a non-central F distribution with р, п p degrees of freedom and non-centrality рага- 
meter л(ы Ho) Y-'(u—us). We can then use non-central F, or one of the approxi- 
mations to it—cf. 24.32—provided that we can specify y. It may also be shown (Simaika, 
1941) that Т? is uniformly most powerful in the class of tests whose power depends only 
on the non-centrality parameter—cf. 24.36 in the univariate case. Similar remarks 
apply in the two-sample case—cf. 41.17. 


42.23 For further studies of distributions and tests in the multivariate case reference 
may be made to the books by 'T. W. Anderson (1958) and E. L. Lehmann (1959). 

'The problems associated with the Behrens-Fisher test for the difference of two means 
when variances are not equal have given rise to considerable controversy and a good 
deal of alleged paradox in multivariate extensions (see Mauldon (1955)). We noted in 
21.15, Vol. 2, that the problem can, in fact, be solved by a method due to Scheffé which 
avoids these difficulties. This method has been generalized by Bennett (1951) to the 
multivariate case. See Т. W. Anderson (1958, Section 5.6) and (1964), and Exercise 
42.12. 

For power functions see Seber (1964b), Darroch and Silvey (1963), Hogg (1961). 
Das Gupta et al. (1964) and Т. W. Anderson and Das Gupta (19642, b) obtained results 
on the monotonicity of the power functions of a number of tests of multivariate hypotheses. 

Arnold (1964) has considered the distribution of Т? under permutations, and Holloway 
and Dunn (1967) its robustness to inequality of dispersions, in the two-sample case. 
Ito and Schull (1964) have discussed the robustness of the T; test, a generalization to 
several samples by Lawley (1939) and Hotelling (1951) of the two-sample T* test for the 
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equality of mean-vectors, whose distribution is studied by Constantine (1966). Pillai 
and Jayachandran (1967) compute the power of Tj, the LR test (42.17), and other tests. 
The tests are asymptotically equivalent; in small samples, they differ little. Schatzoff 
(1966a) compares the tests using a different criterion, and concludes that there is little to 
choose between (42.17) and Tj. Pillai and Jayachandran (1967) make a similar study 
for tests of independence, where (42.44) is the LR test. 

Posten and Bargmann (1964) give a method for obtaining the power of a LR test 
which is completely general for any hypothesis imposing one or two constraints. 


42.24 Tamura (1966) gives asymptotic theory for distribution-free tests of the equality 
of location-parameter vectors in a set of otherwise identical continuous multivariate 
distributions. С. К. Bhattacharyya (1967) studies their efficiencies, concluding that the 
normal scores test is to be preferred. P. K. Sen and Puri (1967) give one-sample multi- 
variate location tests using ranks. 


EXERCISES 


42.1 Show that for p = 1 the use of the criterion ly, of 42.6 becomes equivalent to an F-test. 


42.2 Ву considering the maximization process over the various domains of the parameters, 
show that (42.18) is necessarily true. 


42.3 Following Example 42.1, show that, for the criterion Ги, —2p log l is distributed 
approximately as у? with 4(k—1)p(p+1) d.fr. and 
p= 1-(®1_1\ 2°+3ә-1. 
у vJ6(p-1)k—1) 


42.4 Show that [?/^ of (42.45) can be represented as the product of independent variables y 


q P 
tf fi 5) 
j=2 |k=1 
where уук is a Beta-variable with parameters 3(n— 5;— А), 4D; 
З= 
апа Б = = Pa 
a=l 


42.5 Derive equation (42.41). 
42.6 Derive equations (42.52)-(42.54). 


42.7 A sample of п values is given from a single p-variate normal population. Consider 
the hypotheses 


H: that means and dispersions are equal for each variate; 
Hy: that dispersions are equal regardless of means (i.e. all variances are equal and all covariances 


are equal); 
Hy: that given equality of dispersions, all means are equal. 


Show that the likelihood ratio criteria are given by 
Bini s END — 
(1.—7)?71 G9? {1 + (p 7 1)r0} 
n dex] у 
0-777169? {1+ (p-1)y) 
(Lhe = L 
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2 
where s? is the mean variance X sj/p, 
fe^ 


r= X са/{р(ф—1)%}, 
3#® 
and sj ғо аге the variance and correlation calculated from the pooled variables, e.g. 


жЕ. 1 
26 =, — 2)? -ygb— ï — xy. 
% E HEE тй rs 3:55 gy. 
Show that —2log/, —21og!, and —2 log l, are distributed approximately as у? with 
4p(p+3)—3, 10(р+1) —2 and p—1 d.fr. respectively. 
(Wilks, 1946) 


42.8 Use the distribution of (42.25) to confirm the conclusion of Example 42.4. 
42.9 Verify the results of 42.12. 


42.10 Derive T* as a likelihood ratio criterion in the form 


and derive its large-sample distribution in the null case. 


42.11 Show further that in the general case, with a p-variate sample from N(p, ү), and 
Т? = n(x— po) €! (X— po), 
2 — 
then pu is distributed in the non-central F form with p, n—p d.fr. and non-centrality 
factor n(y.— po) YT (&— Bo) = T°, say. 
4242 х0(ј = 1, 2, ..., т), х2 (ј = 1, 2, .. ., т) are samples from two p-variate 
populations N(u,, Y1), №из, ү). Define 
1 җ 1 
3 м apa Sueciae, 
2" М т" V (nym) j=1 T Жш? 
ј= 1, 2,..., т, k= 1, 2,..., т, 
ѕо аё ў = #0) #02), 
Show that the covariance matrix of the у'з is given by 


tap = Sap (st x). 
ny 

Defining w by 

СА 

(=w= 3 vDo, 

j= 
show that T=mywy 
is distributed as 7? with »,—1 d.fr. 

(Bennett, 1951) 


42.13 Show that any test based on Т? is invariant under a non-singular linear transformation 
of the variables with matrix say M. By considering a transformation reducing the dispersion 
matrix с to I, show that the only invariant function involving only € and с is х'сх. 


42.14 Referring to Exercises 42.11 and 42.13, show that the distribution of Т? may be 
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written as a constant times 

$e (T2/(0—1)H2 Tnj) 
EST К ыт „зү. 5 е 
ere CHS T. jara cp) (+ T/a- jn 

Noting that the most powerful test using T? against tr? 0 is the ratio of this density to the value it 
takes when т = 0, show that the test is uniformly most powerful. 


42.15 In the multivariate regression situation consider the partition of B into B, B, with 
qı and q; columns respectively. For testing the hypothesis Н: f, = fi show that the (2/n)th power 
of the likelihood ratio can be written 

Drm 1:2. —— I 
[n6-- (8: 8v... (8; — 8] 


where Уз = Vii — Vis Уд Va and the v’s are partitions of the matrix v of (42.59) into 41, d» 


rows and columns, viz. as 
Vn Vis), 
Уп Ve 


42.16 Show that when the hypothesis H is true, the moments of / in the previous exercise 
are given by 


(T. W. Anderson, 1958) 


р теби а+1- 00и -2H1-) 
BU) = П гаа) Гиан) 
(Т. W. Anderson, 1958) 


CHAPTER 43 
CANONICAL VARIABLES 


43.1 Apart from problems of distributional mathematics, multivariate analysis 
suffers from one serious handicap in practical application: the difficulty of disentangling 
a complicated inter-relationship among the variables and of interpreting the results 
of the analysis. ‘This leads us to attempt to reduce the number of variables, on the one 
hand; and to transform them to independence, on the other. The methods described 
in this chapter are motivated by one or both of these objectives. 


Component analysis 

43.2 As usual, we consider a row vector xj, j = 1, 2, ..., p, representing a p- 
dimensional random variable and л observations on it, xj, k = 1, 2, ..., n, result- 
ing in a p xn matrix x. It will often be convenient to measure each x, about the mean 
of its л values, in which case the observed dispersion matrix c is given by 

c= ix’ (43.1) 

We recall that if c is of rank m < р there агер – т linear relations among the x's. This 
implies that there is at least one linear transformation to new variables which are only 
m in number—our true dimensionality, so to speak, is m, fewer than p. The result 
derives from the fact, which is not difficult to prove, that the rank of a matrix multiplied 
by its transpose is the rank of the original matrix. 


Example 43.1 
Consider the pxp matrix 
be qe poets онам 
p o p p 
(43.2) 
ЕССЕ 


Add the rows and take out the common factor 1+(p—1)p. Subtract р times the 
resulting unit row from each other row. We then see that the determinant of the 


matrix is 
(1-997 {1+(p—1)p}- (43.3) 
Except in the special case p = 1 or p = —1/(p—1) this cannot vanish. The rank 
of (43.2), accordingly, is р. Hence we cannot represent a set of equally correlated 
variables in fewer than p dimensions. 
We may remark without proof (for which see Ledermann, 1937) that the number of 


independent conditions on a symmetric matrix for it to be of rank m is 1(  —m)(p —m-— 1). 
285 
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43.3 We may represent the matrix x geometrically in two different ways. We 
may set up a Euclidean space of p dimensions, one for each variable, and regard each 
sample set хл, j = 1, 2,..., p as determining a point in it, so that our sample consists 
of a swarm of л points; or we may set up a space of n dimensions, one for each observa- 
tion, and consider each variable in it, so that the variation is described by р vectors 
(lying in a p-dimensional space embedded in the n-dimensional space). In either 
kind of space, degeneration of the matrix x to rank m<p implies that the n sample 
points lie in a sub-space of m dimensions. 


43.4 Consider a transformation to new variables & given by 
Е = ах (43.4) 

where a is a matrix of coefficients. We confine our attention to linear transformations 
of this kind—non-linear situations are much more difficult to handle, and if they are 
suspected to exist an attempt should be made to linearize the data beforehand, for 
example, by a logarithmic transformation. 

We shall, in fact, specialize a to be orthogonal and call it 1’, with elements my. 
Specifically, 


Ш = (43.5) 
We then have for the dispersions of the £'s, say V(£), 
VE) = ге. (43.6) 
If follows, of course, that 
IV) = |с|. (43.7) 


There are р? coefficients /. Equation (43.5) imposes }p(p+1) conditions on them, 
4p(p—1) for the off-diagonal products and р for the diagonals. ‘There are thus 
4p(p—1) degrees of freedom in the transformation. Geometrically, it is equivalent 
to a rotation in our p-space. 

We may find one such transformation, at least, for which the £’s are uncorrelated, 
for this imposes }p(p—1) conditions on them. If the resulting £'s have variances 


оў, оў,..., 05, represented by the diagonal matrix Z, we have 
led-z (43.8) 
and hence, in virtue of (43.5), 
с= 151 (43.9) 
which is equivalent to 
el = 12. (43.10) 
Considering the first row in the equation cl = IZ, we have 
тато (43.11) 
i-i 
or 
(с–с1)1, = 0, (43.12) 
where 1, is the first row of 1. Hence 
| c—off| = 0. (43.13) 


A similar equation is obeyed by the other values оў. Hence the p values of o? are 
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the latent roots of с. The corresponding variables £ are the latent vectors. We shall 
call them principal components. 


43.5 In general there are p different latent roots 2 of the matrix c. We shall 
find it convenient to regard them as of diminishing magnitude, і.е. o?2 027 ... 205. 
If and only if the last q are zero will the matrix c become of rank p—g. The size of 
the latent roots thus gives us a test of the rank of the dispersion matrix. We may go 
further, and say that if o2 is small the variation is © nearly " in p—1 dimensions, and 
so on. 

From the manner of derivation it is clear that the axes in our p-space of the first 
kind are orthogonal. But we have also transformed to variables which are uncorrelated. 
Thus the corresponding vectors in our p-space of the second kind (43.3) are also ortho- 
gonal. Evidently the transformation is unique, because ¢ has only one set of latent 
roots except in degenerate cases. Hence our transformation is the only one which 
simultaneously produces orthogonality in both the p-spaces. 


43.6 Consider the variance of £, 
var £j = ре, (43.14) 


where I, is the jth column vector inl Suppose that we seek to maximize this, subject to 
the orthogonality condition 


Ul, = 1. 
With a Lagrange multiplier 2 we then have to maximize unconditionally 
I cl, — 21, (43.15) 
which, on differentiation by Ij, k = 1, 2, . . . , p, leads to a set of equations summarized 
by 
(c— Dl = 0. (43.16) 


Comparison with (43.13) shows that the values of 2 are again the latent roots. 
From (43.14) and (43.16) we find that the maximum value var &; is in fact the corre- 
sponding latent root. Our new variable £, thus has the property of possessing the 
greatest variance of any linear function of the x’s. £, will have the greatest variance 
among linear functions orthogonal to (uncorrelated with) £,; and so on. 


43.7 It is instructive to consider the same problem geometrically. Consider the 
п sample points in our first type of p-space, measured about their mean and standardized 
so as to have unit variance. Thus 


" 


= 4, = 0; (43.17) 
a=1 
Хар =1. (43.18) 


asl 
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Take a line with current co-ordinates X and direction cosines u;, 
Ху-т, E X,—ms = e Xm, (43.19) 
A "x ar 5 B 
The sum of squares of distances of the п points from this line is given by S, say, 
where 


s-$ È Gm E ш т). (43.20) 
а=1 | ј=1 j=1 
Let us evaluate m and и so that this is a minimum. The partial derivatives of (43.20) 
with respect to each т; then vanish, giving 


-S(s.-m)*-Su, È шт) = 0. (43.21) 
In virtue of (43.17) this reduces to 
m = constant, ј = 1, 2,...,p. (43.22) 
7 


Hence the origin lies on the line (43.19) and we may take the m’s to be zero. This is 
what we might expect: the line goes through the centre of gravity of the points. Then, 
using (43.18) we have 
" Pp. 2 
s=p- 5 (5 ws) z (43.23) 
The ws are subject to the orthogonalizing condition Zuj = 1. We then have to 
minimize unconditionally 


п ур 2 Р 
"ES БЕЗ eaa (43.24) 
a=1\i= =1 
Differentiation by ш, leads to 
Ё OS auti Ац = 0 (43.25) 
j-1a-1 
or 
S тщш, = 0. (43.26) 
=1 


The elimination of the ws leads us back to 

| r—AI| = 0. (43.27) 
Thus the appropriate 2 is a latent root of the correlation matrix. If we had not 
standardized by reducing the initial variation to unit variance we should have arrived 
at the latent roots of the dispersion, not the correlation, matrix. 

Moreover, from (43.23) and (43.25) we find 

S -p-2. (43.28) 
It follows that the latent root which gives the minimum value to S is the largest latent 
root. Our line corresponds to £, and is such that the sum of squares of distances 
of sample points from it is a minimum. 

We can now project all our points on to a hyperplane perpendicular to the line 
(43.19) and repeat the process by finding a line in that hyperplane such that the sum 
of squares of distances from the projected points is a minimum. Our line in the 
(p—1)-space will be given by the second largest latent root of (43.27). This is not 
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immediately obvious, However, we saw in 43.5 that the second latent vector is ortho- 
gonal to the first and hence lies in the (p—1)-space; and that it was derived by maximiz- 
ing a variance, which is equivalent to minimizing the sum of squares of distances from 
the line. 


43.8 The following points may be briefly noted: 


(1) The latent roots of a dispersion matrix are all real and non-negative. This stems 
from the fact that с is non-negative definite. A formal proof will be found in 
most textbooks on matrix theory. See, however, the warning in 43.36. 

(2) In general the latent roots are unequal, but some or all of them may be equal in 
particular cases. Where equality exists, there is no criterion based on variance 
size to pick out any one (among the group with equal latent roots) as having priority. 
Any orthogonal set will do. 

(3) The sum of the latent roots, from (43.13), is the sum of the terms in the diagonal 
of the dispersion matrix, namely its trace. Likewise the product of the latent 
roots is the determinant of the dispersion matrix. 

(4) If A and B are both non-degenerate dispersion matrices the latent roots of 
| A-2B | = 0 are the same as those of | В-:А А | = 0. In particular, if A = I 
we see that the latent roots of the inverse are the reciprocals of the latent roots of 
the matrix. 

(5) The /'s corresponding to any particular latent root are uniquely determined, 
except that they may all have their signs reversed. 


43.9 The question of standardization requires more attention. It has been 
customary, especially in psychological work, to standardize the dispersion matrix by 
dividing by appropriate (sample) standard deviations and hence to reduce it to the 
correlation matrix. In such a case the sum of the variances of the é’s is equal to the 
dimension number p. In effect, the procedure reduces all the variables to equal 
importance as measured by scale. 

However, the latent roots and latent vectors are not invariant under changes of 
scale. In the geometrical representation of 43.7 perpendicular distances are no longer 
perpendicular. Thus, in general, we get different results according to whether a 
scale is initially imposed on the system or not. The point is illustrated in Example 
43.4 later. Whether standardization is desirable is, in the ultimate analysis, to be 
decided on non-statistical grounds. From the statistical viewpoint it is a nuisance, 
especially in sampling investigations, because it complicates the distributional theory. 


43.10 The actual solution of equation (43.13) by desk-machine is a rather tedious 
matter. For details of the iterative process involved see Kendall (1961b). "The advent 
of the electronic machine has altered the arithmetical situation completely, and most 
machines are programmed to handle quite large matrices and print out the appropriate 
latent roots and latent vectors. We shall therefore not allot space to the problem of 
computation. 
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Example 43.2 (Lawley and Maxwell, 1963) 


Five psychological tests were carried out on 123 individuals. The correlations 
between scores on the tests were as follows: 


Test 
1 2 3 4 5 
1- 0-438 —0:137 0:205 — 0:178 
t 0-031 0-180 —0-304 
t: 0-161 0-372 
1 —0-013 
1- 
» uti hp (43.29) 
A principal component analysis gives the following latent roots and vectors: 
Latent | Coefficients of 
roots x Xa Xa Xa Xs 
1:75714 "55550 56470 — :27000 23572 — 49403 
1:33070 —'18568  — 24745  — 66199 —:55654 —-39538 
0-78086 :21597 43969 32041 — :78704 19478 
0-70916 64078  —:30765  —:39839 —-01481 57950 
0:42214 44688 — :57611 47696 — 12266 — 47523 
“мег == Ax GE S = i (43.30) 
The matrix Z is given by the five columns on the right. For example 
£y = -55550x, + 56470», — 270002, + -23572x, — -49403.x5 (43.31) 
and, reading downwards, 
x, = 5555505, — 185682, + 215978, + 640782, + -446886,. (43.32) 


The original data, of course, hardly bear five-figure accuracy in these results, but it 
is convenient to retain them for checking purposes. 

In psychological work it is customary to express these coefficients in a modified 
form. Instead of the variables & we introduce 


G = EVA (43.33) 
so that the Z's have unit variance. The matrix of coefficients of the x's in terms of 
Св is then 


©з 
x's 1 2 3 4 5 
1 773635 —-21419 -19084 "53961 "29035 
2 "74855 —:28545 38853 — 25908 — 37432 
3 —:35790 —:76364 -28313 — 33549 "30989 
4 31247 —:64200 —-69548 —:01247 — 07969 
5 


—:65488 — 45609 +17212 48801 — :30877 


(43.34) 
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Thus, for example, 
ху = 736355, — 214190, 4- 190817, + 539617, + -2903545. (43.35) 

"These coefficients are known as factor loadings, the {°з being regarded as factors 
structural to the situation and the coefficients as weights with which they appear in 
the different variables. 

We will deal with questions of testing and estimation presently. Taking the 
data as they stand, we see that the variables § (which by definition are uncorrelated) 
account, in turn, for 35, 27, 16, 14 and 8 per cent of the variance. These numbers 
are obtained by dividing the latent roots by р, in this instance 5. We might, as an 
approximation, be willing to omit the last variate, in which case the first four contribute 
92, per cent of the variance; or even the last two variates, the other three contributing 
78 per cent. But we still require measurements on all five x's to calculate these three 


Cs. 


43.11 We have noted that the latent roots are uniquely determined by the dispersion 
matrix. From (43.16), whether relating to sample or parent, it is clear that 1, is also 
uniquely determined, except perhaps for a change of sign, which we can always deter- 
mine by taking l as positive. ‘Thus there is a one-to-one relation between the latent 
roots and latent vectors, and the dispersion matrix and the mean vectors. Since the 
sample values of the latter are the ML estimators of the corresponding parent values, 
the sample values of the latent roots or vectors are ML estimators of the parent values 
in normal variation. 

‘The problem of bias has been considered by Dempster (1966). Exact expressions 
are complicated. 


Testing of latent roots 

43.12 An exact theory for testing latent roots is difficult to attain, for several 
reasons. Distributions are complicated; standardization procedures, as already noted, 
further complicate the issue; and we may be interested in the special cases where the 
latent vectors are indeterminate in the sense that a group of latent roots may be equal. 

Let us be clear about the kind of hypothesis which we wish to test. The first is 
whether the sort of transformation which we have been discussing is worth while at ай, 
This is equivalent to asking whether the latent roots are different from one another. 
If they are not, the original x’s are just as good as the £'s for purposes of representation. 
To put it another way, are the x's independent? 

We arrived at a test of this hypothesis in 42.12. The criterion is then the correla- 
tion determinant, —7 times the logarithm of which is distributed approximately in the 
43 form with 4p(p—1) degrees of freedom. More accurately, 

2р+11\ 
‚(1 i 


log |r| 


is distributed as 7?. 
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Example 43.3 

In the data of Example 43.2 the value of the correlation determinant is (being 
the product of the latent roots) 0-54659. The value of л is 123. Thus, approximately, 
— 123 log 0:54659 = 74-3 is a у? value with 10 d.fr. This is extremely unfavourable to 
the hypothesis. 

For the more accurate approximation we have that 


- (n-72¢4) log |r| = 722 


isa у? also with 10 d.fr. The conclusion is unaffected. 

This test is one of independence. If we wish to test both independence and 
equality of variance we use the sphericity test of 42.13 applied to the dispersion matrix с. 
The criterion is then 


—n(log | c | — log (trace ¢/p)} (43.36) 
with 4p(p+1)—1 d.fr. For the more refined test we replace л, using (42.54), by 
_2pt+p+2 

бреу (43.37) 


43.13 This kind of test reveals whether there is any point in transforming to 
canonical variables &. 

In one sense no test is required for non-vanishing latent roots. Any value which 
is greater than zero cannot have originated from a population in which the corres- 
ponding parent value was truly zero; for, if it had, the parent variation would lie in 
a sub-space and no sample point could arise from outside that space. This will not 
necessarily be so if the variate values are subject to errors of observation and measure- 
ment, but this case must be deferred for consideration until we examine factor analysis 
later. 


43.14 We may, however, legitimately ask this kind of question: suppose that 
certain latent roots are large and account for most of the variance; do the remaining 
values differ significantly among themselves, or could they have arisen from a complex 
in which the corresponding variables are effectively spherical or at least uncorrelated? 
"То put it another way, are the remaining /’s and their associated latent vectors dis- 
tinguishable? 

Bartlett (1954 and earlier papers) proposed on somewhat heuristic grounds to test 
such a hypothesis by an approximate у? test. Suppose that we have decided to retain 
the first А latent roots and wish to test whether the remaining p—k are equal to some 
unknown value. We assume that the sampling errors are small enough, compared 
with the differences among 21, . . . , Žr, for us to be able to set up an almost certain 
correspondence between the parent 4; and the sample 4; for j = 1, 2,..., k. Since 
the dispersion determinant is the product of the latent roots it seems reasonable to 
test that determinant against the one which would be reached if the last р — А roots were 
all equal. In the particular case of a correlation determinant this value is 


ДА... {2 al a (43.38) 
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The criterion proposed is therefore the ratio of the two determinants, namely 


-k 
ыд...) t cent ttal (43.39) 
where the 2° are sample values. This may be regarded as the (p— k)th power of the 
ratio of the arithmetic to the geometric mean of 25.5... Ap 


The proposal is that the logarithm of this quantity, multiplied by a factor involving 
п, should be tested in the у? distribution with 
Mp—k-1)(p—k+2) dr. (43.40) 
Lawley (1956a) has shown that if the multiplier is taken as 
_1_1/ҖФ-®Ю*#+рР—-Е+2\ | 2. э 1 
ET a rH er iguy (43.41) 
À being estimated as the mean Of Aga 2 +. „А the criterion has the correct moments 
for a y? distribution to O(m-*). If A, . . . » Ay are large compared to / the last term 
in (43.41) could be omitted. 


43.15 Strictly speaking, these results apply to the dispersion matrix with units in 
the diagonals. Application to a correlation matrix is impaired by the fact that we 
standardize using the sample variance. It appears that in this case the criterion does 
not follow a 7? distribution. However, a rough test may be obtained, faute de mieux, 
by using the results of 43.14 as if they applied to a correlation matrix. 

In this connexion, consider again the data of Examples 43.2 and 43.3. Suppose 
we decide that the two largest roots are different enough to justify a supposition that 
they are distinct among themselves and also distinct from smaller values. 

The product of the remaining three roots is 0-23376 and their mean is 0-63739. 
"Тһе multiplier of (43.41), neglecting the last two terms, is 120, and the criterion becomes 
120 {3 log 0-63739—log 0-23376} = 12:3. 

From (43.40) the number of degrees of freedom is 5, and the observed value exceeds 
the 5 per cent point but not the 1 per cent. We suspect that the last three roots are 
genuinely unequal. 


Large-sample results for latent roots 

43.16 We can make further progress by considering asymptotic theory, namely 
standard errors and covariances when the parent latent roots are all distinct. The 
results were first obtained by Girshick (1939). 

It is indifferent, to our order of approximation, whether we write our formulae in 
terms of parent values or of sample values. We will use sample values. We then 
have the following relations: 

D lalar = бу (43.42) 
E сы = Aly (43.43) 


and, using the Kronecker delta as before, we derive from (43.43) 
У c bate = дт (43.44) 
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From (43.43) we then have 
Ec, d,- E deja lig = dy dl dÀ;lg. (43.45) 
а x 


Without loss of generality we may now suppose the axes rotated to the £-axes, in which 
case cj; = 4; and cj, = 0, j # k. Then the first terms on each side of (43.45) cancel 
and we find 


dc, = dà. (43.46) 
Thus соу (4544) = cov (су, Gx) 
which, by use of (41.98), gives us for the normal case 
cov (4, А) = = de (43.47) 
Hence 4;, A, are uncorrelated for j # А and 
var a, = A (43.48) 


To our approximation this entails that 
var (log 4j) = 2/n, (43.49) 
a convenient form since the variance does not depend on the parameters 4. 


43.17 Once again, the results for a correlation matrix, as distinct from а dis- 
persion matrix, are much more complicated. We quote the results from Girshick 
(1939): 


2 
cov (apd) = (Z faris (4+4) EI), (43.50) 
vari = + E t E N) (43.51) 
LJ a 


where r,, typifies correlations. 


43.18 The same method may be used to derive variances and covariances for the 
coefficients of the latent vectors. For what they are worth, we quote the results; but 
it must be remembered that in practice we should rarely wish to test an individual 
direction cosine. 


соу (Is Int) = Pid Ё #1, (43.52) 
LAQA( А Tie ip 
vat Ip = a(t I es "uh (43.53) 


while соу (l lmr) is given by (43.53) with each $, replaced by [yy lng. 


For some recent work see Т. W. Anderson (1963), who proves the asymptotic normality 
of the distribution of latent roots and vectors and deals with the case where some of the 
parent roots are equal, 


43.19 As a statistical tool, principal component analysis is, perhaps, best regarded 
as an exploratory instrument to enable us to see what is the effective number of dimen- 
sions, or how dominant are certain linear combinations of the variables. There is 
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one case in which the use of the analysis has been pushed further, perhaps beyond 
allowable limits. Suppose that the largest latent root is dominant, accounting for, 
say, 70 or 80 per cent of the variance. It may be a rather Procrustean procedure to 
neglect the remainder and to force the whole variation, so to speak, in the direction 
of the first latent vector. But there are occasions when we are willing to do this; for 
example, if the x’s are values of business activity indices of one kind or another, bank 
deposits, freight loadings, imports and so on, we may be willing to allow the first 
principal component to determine a single number expressing the general intensity 
of business activity. The values of ë, then become a weighted index number of the 
constituent values of x. Whether this index remains a pure artefact, or whether it 
corresponds to some “ real ” intensity of business activity, is a matter of interpretation 
to be decided in the light of our knowledge about the economic structure of the system 
under study. 

Kendall (1961b) has shown how a fair approximation to the ordering of a set by the 
first principal component can be attained by ranking methods. Little else seems to be 
known about distribution-free methods in the field of canonical analysis. 

Beale et al. (1967) give a computerized method of reducing dimensionality from p to 
a fixed value q, by maximizing the smallest multiple correlation coefficient of any rejected 
variable upon the selected variables. They compare their method with component 
analysis. 


Example 43.4 (Craddock, 1965 with some supplementary information kindly supplied 
by him in correspondence) 

Manley (1953 and later) has constructed a remarkably long series of monthly 
temperatures for Central England from 1680 to 1963. The data are in degrees 
Fahrenheit to the nearest tenth of a degree. The year, for the purpose of the analysis, 
was taken to run from November through the following October. 

Each year was taken as a 12-dimensional quantity, one value for each month. Thus 
no scaling problem arises. The variate values were measured from the mean of the 
whole series, not the individual monthly means. This leaves in the picture the varia- 
tion of temperature over the year, and we shall not be surprised to find annual variation 
in a dominant position. 

There were thus 283 sets of monthly mean temperatures running from November 
1680 to October 1963. 

In the treatment earlier in this chapter we have assumed the values of each com- 
ponent of x to be measured from the mean of that component. If we measure from 
some other value, our product-sums divided by are no longer covariances but second- 
order moments. The analysis remains valid, but we must expect one component, 
probably the first, to correspond to an axis from the alternative mean through the sample 
mean. 

The product-moment matrix is shown in Table 43.1, overleaf. 
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The first ten latent roots of this matrix are as follows: 


Latent root number Value as percentage 
of variance 


92-38 
2:05 
112 
0-98 
0:67 
0:58 
0:49 
0:45 
0-41 
0-36 


Sem suspen 


Sum of first 10 99-47 


The amount of variation accounted for by the first latent vector is unusually high, 
but there is, of course, a reason—the major variation is a seasonal movement. The 
coefficients of the first four latent vectors are given in Table 43.3 later. Plotted against 
the monthly means given in Table 43.2, they are seen to pursue an almost identical 
pattern of seasonal movement. 

In psychological or economic work we should hardly bother to consider the other 
latent roots. However, the interest of the present example is that sufficient know- 
ledge is available of the physical system which generates the data to enable some 
attempt at interpretation. Craddock, to whose paper reference may be made for 
details, identifies the second component with climatic changes in the annual mean 
temperature, and the third and fourth with patterns of variation in winter temperature. 

'Table 43.2 gives the covariances, moments being measured about the monthly 
means. 

'The first four latent roots of this matrix are: 


Latent root number Value as percentage 
of variance 


27-50 
15:20 
10-84 
10-78 


Фомы 


Sum of first 4 64-32 


The picture of residual variation, after the abstraction of seasonal components, 
is now much less clear. From the coefficients of the latent vectors, which are given in 
Table 43.3, it appears that the first (whose coefficients are all positive) represents 
movement of a secular kind; the second and third indicate a harmonic movement 
over the year and, as in the analysis of Table 43.1, seem to represent a type of variation 
in winter temperature. 
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It is interesting to consider what happens to this analysis if we standardize by 
reducing the covariances of Table 43.2 to correlations. The corresponding figures 
are: 


Latent root number Value as percentage 


of variance 
1 22:54 
2 11-51 
3 10-29 
4 8-93 
Sum of first 4 53-27 


Although the differences are not so very large, they are appreciable. Even in 
this case, then, when all the p components of the vector are measured in the same 
units, standardization makes a difference. We should expect greater differences in 
cases where the components are measured in units of different range. 


Canonical correlations 

43.20 The transformation of a set of variables x to a canonical set is effectively 
the reduction of a quadratic form to a sum of squares by linear transformation. We 
now turn to consider the general theory of the relations between two sets of variates 


Xp +++) Xp and хрр +--+» X94, Where we suppose that p<q. Following Hotelling 
(1936) we shall show that in general there can be found linear transformations to 
variates ё, .. » 5 Ёр; Epa + ++ » Sp4q Such that 


(a) All the é’s have unit variance and zero mean; 

(b) any & in the p-group is uncorrelated with the other é’s in that group; 

(с) any & in the g-group is uncorrelated with the other #% in that group; 

(d) the correlation between any £ in the p-group and any # in the q-group is zero except 
for p correlations ру, ps, . . . , py, Which may be taken to be the correlations between 
&, and Ёар ёа and psn -s Šp and ёр 


The variates & аге then said to be in canonical form and the p’s are called canonical 
correlations. We have already discussed canonical correlation in the context of the 
analysis of categorized data in 33.44-9, Vol. 2. 

In the case of a single set of variables we were able to ensure that the é’s in turn 
accounted for as much as possible of the total variation. This is no longer possible 
here. The optimization is concerned with the reduction in the intercorrelations to 
a minimal set. 

We will suppose that our variables x have zero means and dispersions typified by 
Уд. Those dispersions in the p-group we denote by Greek suffixes, yap, and those 
іп the g-group by Roman suffixes, уд. For the covariance of a p-variate and a q-variate 
we write one Greek and one Roman suffix: уш. 

To simplify the notation we will omit suffixes referring to sample labels. Indeed, 
we can go further and omit other suffixes identifying &, 2) and the corresponding 
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coefficients in the transformation. Consider now a particular pair of variables, one 
from each group, given by 


ЕЕГ «= 1,2,...,ф; (43.54) 
ET rues а=р+1‚,р+2,...,р+4. (43.55) 
We take them to have unit зе апа Һепсе 
E Llp yop = 1, (43.56) 
= MMs yap = 1. (43.57) 


We now seek the condition that their correlation R is stationary for variations in the 
coefficients / and т, namely that 


R= 3i Уа (43.58) 
me 


is stationary. Taking two undetermined multipliers 34 and фи, we then have to find an 
unconditioned stationary value of 
ElLm,ya—1AELlya-3u E m, My Yar: (43.59) 
On differentiation this leads to 
Э 1. Yaa— H B тууа = 0, 


Em, Yaa AZ lp Yop = 0. (43.60) 
а 
Multiplying the first equation by m, and summing, and the second by l, and summing, 
we find, in virtue of (43.56)-(43.58), 
R-À-yp. (43.61) 
Equations (43.60) are then solvable for Z and m if their determinant vanishes. Writing 
A for и, we find the (p+gq)* determinant 


| —Ayop Yo jg СВЕ 2: 585 
ры] ERROR ДЕР (43.62) 
Multiplying the first р rows by — and dividing the last q columns by —4 we find 
2 
ies e etg, (43.63) 
| Yag Yab 


If we insert another (p+g)* determinant on the left of (43.63), it will still equal 
zero. We insert 


М —YaYa | 
|. 


0 Ya 
Since the determinant of a product is the product of the determinants, (43.63) becomes 


|42 ү. -YoY Үз 0 | 
| == 


(=a) 0 


Үш Yas L| 
(—2) | 4? Yop— Yoo Ya Yap | = 0. (43.64) 


or 
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This is of order p+qin 4. g—p roots are zero. The others arise in pairs from the p xp 
determinant in (43.64). We may write the non-vanishing roots as +p +рь,..., +ø- 


We choose as the roots those which are not negative and proceed to prove that they are 
the canonical correlations as we have defined them. 


43.21 We have arrived at variates £ obeying condition (a) at the beginning of 
43.20. A simple root of (43.64) substituted in (43.60) gives us the coefficients / and 
m, except for a multiplicative —1. For a root of multiplicity ¢ they are determinate 
except for t—1 assignable constants, a result which we take without proof from the 
theory of algebraic forms. 

To complete, we need to prove that the £'s in each group are uncorrelated and that, 
apart from the canonical correlations, any £ in one group is uncorrelated with any ё 
from the other. Suppose we have a root р; and determine the corresponding constants 
l; and m; and hence the pair of corresponding variables £; and z;. Then we have 
from (43.60) 


E lix Yoa = Pi È Mi Yar (43.65) 
У Mia Yaa = pi È lig Yap- (43.66) 
Similar equations obtain for a second pair, say £j and у. Between these four vari- 


ables there are six correlations, of which two are p; and ру. It will be enough to show 
that the other four vanish. They are 


E(&E)- E lal Yar, (ту) = E Miam Yaw (43.67) 
Ef) E liami Yan Ет) = E lja туа (43.68) 
Multiply (43.65) by m,, and sum. In virtue of (43.68) we have 
E(Ein;) = p, Eum). (43.69) 
Likewise from (43.66) multiplied by J,, we find 
Е( ун) = p E(£,£j). (43.70) 
Interchanging i and j, we find from (43.69) and (43.70) 
PiE) = p, ES), (43.71) 
and interchanging i and j in this, 
py EQ) = p Е(&#)). (43.72) 
It follows that unless p? — pj 
E(nm) = E55) = 0, (43.73) 


and in a similar way the other covariances may be shown to vanish. 

We have only to round off the proof by showing that if p is a root of multiplicity 
t the property still holds. This follows from the consideration that we may then 
choose our /’s and m’s to obey certain orthogonal conditions ensuring that 


E(&;£) + Elan) = 0. 
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It will then follow from (43.72) that each expectation vanishes unless p; = p; = 0; 
and even in this case (43.69) and (43.70) show that two expectations vanish, and we 
may then choose our assignable constants so that the others vanish. 


43.22 When the variables are put into canonical form the dispersion matrix 
reduces to 


0 (43.74) 


with a determinant equal to 
-A-A ... (1-09. (43.75) 


Example 43.5 (from Hotelling (1936), dealing with data of T. L. Kelley) 

140 seventh-grade schoolchildren were given four tests in (a) reading speed, (b) read- 
ing power, (c) arithmetic speed, and (d) arithmetic power. It is required to find canon- 
ical variates for the two reading tests and the two arithmetic tests. 

The correlations between the variates were: 


| ж.о ей | ZEE 

| 
РА 10000 | 046328 | 02412 | 00586 
Xs 06328 | 10000 | —0-0553 0-0655 
P» 0:2412 —0-0553 1-0000 0-4248 
x, | 00586 | 00655 | 0-4248 1-0000 


The determinant (43.63) becomes the symmetric determinant 
| -A = —0-63282 0:2412 0-0586 
| -A —0-0553 0-0655 
| = —0-42484 (3:19) 


-A 
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or 0-491,37024—0-078,803,42? + 0-000,362,490 = 0, 
giving 22 = 0-155,635 ог 0-004,740 
with å = 0-3945 or 0-0688. 


To find the transformed variates themselves we use (43.60). For instance, with the 
root 0:3945 for 2, we have 


1, +0-6328/,+0-6114m,—0-1485m, = 0 (43.77) 
0-6328/, + 1,+0-1402m, —0-1660m, = 0 (43.78) 
—0-61141, +0-1402/,+ m,+0-4248m, = 0 (43.79) 
—0-14851, —0-1660/,-+0-4248m, + т, = 0. (43.80) 


The last equation is linearly dependent on the other three and so adds nothing. In 
the other three we solve for the ratios of /'s and m’s, finding 
Һ:Ь:тү:т« = —2-7772 : 2-2655 : —24404 : 1. 

Thus the transformed variates are 

ё = —2:7772x,+2-2655x, (43.81) 

heb, = —2-4404x,+%4, (43.82) 
where k, and k, may be chosen so that the variances of £, and э, are unity, if desired. 
Similar equations with the root 0-0688 will give us a further pair of canonical co- 
ordinates. Those we have worked out have the maximum correlation, the other pair 
having the minimum and therefore being of less interest. 


43.23 Standard errors may be obtained in the manner of 43.16. Starting from 


ElLhhos-1 (43.83) 
E m, ™M, Ca = 1 (43.84) 
DEM bg =T, (43.85) 
we differentiate to find 
2X cl, dlp+ 1, lg deag = 0 (43.86) 
2E c,m,dm,4 X m,m, dey, = 0 (43.87) 
dr = У lym, deya +È Lc, ding + X Mg Coq dl,. (43.88) 


Without loss of generality we may now suppose the variables put into canonical form. 
All ls and ms except /, and m, vanish and we have 


2dl,- de = 0 (43.89) 
2dm,+ dey 41, 541 = 0 (43.90) 
dr, = dey, p41— т (deii + dey 41, ра). (43.91) 
Substituting from the first two in the third of these equations, we find 
dr, = de, рм Mri (de + dey as, рза). (43.92) 
Similar equations apply to any other simple root, for example 
dr, = dcs, +а—37з(Чсзз+ de, +2, рз). (43.93) 


Multiplying (43.92) and (43.93), taking expectations and using (41.98) we find 
cov (nr, 72) = 0. (43.94) 
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Likewise 
var 7, = la —pi?, (43.95) 


with similar formulae for the other correlations. It is noteworthy that this is the same 
as the large-sample formula for an ordinary product-moment correlation. 


Hotelling (1936), to whom this derivation is due, showed that if p = 2, q>2 and a 
zero root accordingly has multiplicity 2, then nr? is distributed аз д? with t—1 d.fr. If 
a canonical correlation vanishes and p = q, (43.95) holds, with the qualification that 
sample values near the zero root must be allowed to have positive or negative values, 
or alternatively that the distribution of r is that of the absolute value of a normal variate. 

Lawley (1959) derives expressions for the third and fourth cumulants of r. He also 
considers the variance-stabilizing transformation involving ar tanh r, but the results are 
not so satisfactory as for the product-moment correlation in 16.33 (Vol. 1). 


43.24 It follows from (43.64) that the р? are the roots of the determinantal equation 


| p? I— vai Yan Ya Yap | = 0 (43.96) 
or 

| ?I— vi Yi Yd Ya | = 0, (43.97) 
where y, is the matrix of the p variables x, to xj, Yas that of x, 41) . « > хрр Yaa iS 


the covariance matrix between the p variables and the q variables, and similarly for 
Yu. Thus the p? are the latent roots of the matrix product in (43.97). 


43.25 The results of canonical correlation analysis аге even more difficult to 
interpret than those of component analysis. It is best regarded as an exploratory 
tool which will give us some idea of the structure of the multivariate complex under 
study, and in any case tells us what can be the maximum amount of correlation between 
linear functions of the two groups of variables. The literature of the subject has few 
examples of useful practical application; cf. Barnett and Lewis (1963) for one in educa- 
tional research. For this reason we will pass rather quickly without proof over some 
remaining theoretical points. 


(a) For simplicity of exposition we have supposed g>p. If ¢<p we simply reverse the 
roles of the two groups. 

(b) If we insert ML sample values for the matrices in (43.97) we obtain ML estimates 
of the canonical correlations. 

(c) Looking at the matrices entering into (43.64), we see that one is the dispersion 
of the p-group and the other (product of three) can be regarded as the contribution 
from regression of the p-group on the fixed g-group. Thus the theory of regression 
(42.15-20) applies here. The distribution of the latent roots р? in (43.97) is that 
of the 2's in 41.22-3, provided that the p-group and the g-group are independent, 
which unfortunately is the case of least interest. 

(d) Bartlett (1947) proposed a test, analogous to that of 43.14, based on the expression 
of the correlation determinant as the product of р factors 1—p3. If А canonical 
correlations have been accepted as non-zero, the criterion for testing that the 


306 THE ADVANCED THEORY OF STATISTICS 


others are zero is 
k 
—{n—1-k-}(pt+qt1)+ = rj?) log П (1—2), (43.98) 
j=1 j=k+1 


which is approximately a 7? with (p—k)(q—k) degrees of freedom, Lawley (1959) 

has investigated this test with reasonably satisfactory results. 
(e) In one other case some progress towards practical application can be made, namely 

when one canonical correlation is not zero but the others are. See Bartlett (1947). 
(£) Dempster (1966) has considered the removal of bias from estimates of the canonical 

correlations by Quenouille’s method (cf. 17.10, Vol. 2). 

A unified exposition of canonical analysis and related multivariate problems is given 
by E. J. Williams (1967). 

Factor analysis 

43.26 The methods we have so far discussed in this chapter are designed to 
examine a system to see what sort of structure it may have. Those we now examine 
tackle the problem, so to speak, from the other end. We begin with some model of 
the structure. The problem is to see whether it fits the data and, if not, to modify 
it until it does. 

Specifically, we suppose as usual that we have a p Xn matrix of n observations on 
a (фр х1) vector x. We suppose that the observed x’s are, in fact, linear functions of 
some underlying variables £ which are known as factors, there being m<p of them. 


Thus we have xy = ЫЛ Ekt у. (43.99) 


The coefficients / are not now the constants of a rotation to new axes. As in component 
analysis, they are referred to as factor loadings (a term surviving from early psycho- 
logical usage for what are more familiar to the statistician as “ weights "). ‘The ¢’s are 
assumed to be independent normal variables with zero mean and unit variance. Since 
our x-complex, in general, is not representable in fewer than p dimensions, an exact 
representation of x's in terms of {°з requires an error term e. As part of the model 
we suppose that ғ; is independent of e; and of all the ¢’s. Our problem is to estimate 
the constants Z and the variances oj of the e’s. 

This is not a regression model. Our С° are random variables which we do not 
regard as fixed quantities like regressors. The relationship is structural in the sense 
of Chapter 29, Vol. 2. 


43.27 "Тһе first thing to notice about the model is that it is undetermined. We 
are representing a p-dimensional complex in terms of m-- random variables. In 
(43.99) there are pm constants /, mn values of С, and pn values of ғ. Considered as 
a set of algebraic equations, (43.99) has many solutions. We have already assumed 
that the ¢’s are standardized normal and we shall also require the e’s to have zero mean. 
The question is whether, in conjunction with the conditions of independence among 
and between ¢’s and e's, the problem of estimating pm constants / and р constants 
о? is determinate. 

Since the С°з and e’s are normal, the x's are also normal. We have then 


соу (х.х) = E(E letit e (E lute eg) 
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= i IDE (43.100) 
i: 
EE 5 й+о). (43.101) 


We may summarize these relations in 
у= Ш +Х (43.102) 
where 1 із the р хт matrix of coefficients J,, and Z is ће р хр diagonal matrix of oj. 
The number of dispersions on the left in (43.102) is 15(2--1) The number of 
constants on the right is p(m+1). Thus if m--1»3(p--1) there are not enough 
relations in (43.102) to determine the constants. We shall, in fact, normalize the 


constants / by requiring that 
i 11/0 = 0, jh (43.103) 
m 
or equivalently that 
rz--]J, (43.104) 
ап mxm diagonal matrix. ‘This imposes a further jm(m— 1) conditions on the con- 
stants under estimate. "The equations will be indeterminate if 
4p(p+1)<p(m+1)—4m(m—1) 
which reduces to 
(p—m)*<p+m. (43.105) 
We shall therefore assume that the contrary is true. 


Example 43.6 
The inequality of (43.105) reversed is equivalent to 
{(p+4)—m}*>4(8p + 1). (43.106) 
For example, with p = 5 our model is indeterminate if m is greater than 2. We 
should not set up a model of a 5-dimensional complex with more than two factors. 
For p = 10 the largest admissible value of m is 5. 


43.28 The reason for imposing the orthogonality conditions (43.103) is as follows. 
Consider a non-singular orthogonal transformation of the 2's to new variables y given by 
$ = Mn. 

The variables y will also be independent standardized normal, and in place of (43.102) 
we should have 
ү = IM(IM) +2 = IMM' l +2 
=W+z. 
In short, our Св are indeterminate within an orthogonal transformation. Equation 
(43.104) resolves the indeterminacy in a convenient way, but there are other methods 
of doing so. 


43.29 1f, as we henceforth suppose, (p—m)*>p-+m we have, in equations (43.102) 
and (43.103), more equations than constants. We cannot therefore solve them as 
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a matter of algebra but require some reconciliation procedure. Following Lawley 
(Lawley and Maxwell, 1963) we shall use the method of maximum likelihood. 

One peculiar feature of this situation is that, although we are dealing with ML 
estimation in normal variation, the sample covariances of the x’s are not estimators 
of the parent covariances. ‘This is because of the constraints of the estimation repre- 
sented by (43.102) and (43.103). If we could take the observed c’s as estimators of 
y's we should have simply 


— о 
y 

= 
м 

[А 
— 


As we shall see shortly, the second equation is true but the first is only true of the on- 


diagonal elements. 
We start from the logarithm of the likelihood function 


log L = constant- 3 log | y |- 3n E Гус, (43.107) 
where Г is inverse to ү. Substitution from (43.102) for y gives us a function which 
we maximize for variations inland E. Where there is no ambiguity we omit circumflex 
accents for ease of printing. 

Differentiation with respect to of gives us, after some algebra, 


T,- E Гусь Гы = 0 (43.108) 
2, 


which is, summarized for all t, equivalent to 
diag (y-!—y-!ey-?) = 0. (43.109) 
Differentiation with respect to J, gives, after some reduction, 
Eda Py~ X eT iu Culo = 0 
which is the element in the jth row and kth column of 
I'y-1-I'y-1 cy-} = 0. (43.110) 
To be consistent we must have, as well as (43.108) and (43.110), the equations (43.102) 
and (43.104) applying to ML estimators, 


y=W+=z (43.111) 
J-2rz-L (43.112) 
From (43.110), postmultiplying by y, we have 
ї-Гү-іс= 0 (43.113) 
апа Һепсе 
пе = гус (43.114) 


Premultiply (43.109) by y—Il’, which is equal to Z. We find 
diag (I-cy-'—1l' y-?+ ll’ yy cy!) = 0 
which, in virtue of (43.113), reduces to 
diag (1—сү-!) = 0. (43.115) 
Multiply this on the right by y—1l'. We find similarly 
diag (y—c) = 0. (43.116) 
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This is equivalent to the equations 
& = cy- È B. (43.117) 
Now from (43.112), postmultiplying by 1’, we find 
Jr2rz-im-zrz-(y-z) 
-2Irz-wx-r. (43.118) 
Thus 
Pr ey 
which, in virtue of (43.114), reduces to 
jyüe-2rz--Ire- 


jl = i (£2c- ££) 
= i£3(c-2) (43.119) 


giving 


43.30 The equations are still troublesome to solve. Recalling that J is diagonal, 
we see from (43.119) that its elements are the latent roots of Z-'(c—Z). One iterative 
procedure is to guess some values of 2, determine from (43.119) the latent vectors 1, 
substitute in (43.117) to improve the estimate of the oj, iterate with these improved 
estimates in (43.119), and so on. We can estimate y from (43.111). 

The process may, however, converge very slowly (cf. Howe, 1955) and it appears 
that on occasion the estimates of some of the в? tend to zero. It cannot be said that 
this subject has been mastered. 


43.31 When satisfactory estimates have been obtained, the usual type of likelihood 
ratio can be used to test whether the number m of factors which have been chosen is 
satisfactory. Under the hypothesis that there are in fact m factors, the log likelihood 
is proportional to 

—4n log | ? |- 32 tr (cT) (43.120) 
On the hypothesis that the x's are normally and independently distributed with no 
errors e, the sample dispersions are estimators of the parent values and the log likeli- 
hood is proportional to 


—4n log | c|—4ntr (cc?) 


= =} log 11-49. 
Thus the ratio 
~n {log Lel- (сг) +) (43.121) 


is distributed approximately as y?. "The number of degrees of freedom is the number 
of constants fitted in the second case less those in the first, which is 
3{(p—m)*—(p+m)}, (43.122) 
as we noted in (43.105). 
Bartlett (1951a) suggested that a better approximation would be obtained by using, 
instead of n in (43.121), the multiplier 
п'=п—{(2р+11)—{т. (43.123) 
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The argument is that this is known to be correct when m = 0 (42.12) and is presumably 
so when n—m and p— are substituted for n and p. 

The rejection of the hypothesis by this test means that more factors are required. 
The calculation of the criterion (43.121) is tedious, and its value appears to be sensitive 
to the accuracy of the estimation of the parameters 1 and Ж. Lawley and Maxwell 
(1963) propose an approximate form 

‚ y (бъ Ўт)? 
n 2 (б, бр (43.124) 
where n’ is given by (43.123). 


43.32 Before the advent of the electronic computer, psychologists were compelled 
by arithmetical necessity to adopt various devices for obtaining solutions to problems 
of factor analysis. Some of these can scarcely be said to be more than measures of 
desperation; others, though difficult to validate with any degree of theoretical rigour, 
are still useful, if only as providing first approximations from which the iterative solu- 
tions of the exact equations can start. Reference for details and numerical illustrations 
may be made to Harman (1960), Kendall (1961b) and Lawley and Maxwell (1963). 


43.33 In factor analysis, as in component analysis, we emerge with expressions 
giving the variables x as weighted sums of some unobserved—usually unobservable— 
variables $. The main difficulty, as a rule, is to know what the results mean. Psycho- 
logists usually try to identify the ¢’s with some factors which they believe to underlie 
the structure of the system. Application of the same technique to physical systems 
very often results in weighted sums of variables to which no clear interpretation can 
be given. For this, and possibly other reasons, we may return to the model to see 
if any useful modification can be made in it. 


43.34 We recall first of all that, to arrive at a unique solution, we imposed an 
orthogonality condition (43.103) on the factor weights. There is nothing in the 
model to require this, and, having found the /'s, we are at liberty to transform the ¢’s 
how we like in the m-dimensional space of Z's. We can, in short, rotate the factors. 
We can even transform them to non-independent factors. We have, so to speak, 
estimated the factor space but are not committed to any particular co-ordinate system 
within it. There are infinitely many choices, and which we take depends on non- 
statistical considerations in any particular case. Two criteria suggest themselves: 


(a) To rotate so that as many /'s as possible vanish (or have some minimal property). 
'This is tantamount to invoking some law of parsimony in explaining the x's in 
terms of ¢’s—as few ¢’s appear in the relationships as possible. 

(b) To rotate so that some factor loadings are maximized. The object here, as a rule, 
is similar, for in general, increasing the value of some /'s can only be carried out 
at the expense of others; but it may also lead to the identification of a factor with 
a variable x. 


From a rather different viewpoint, but with much the same objective, we may impose 
conditions on the /'s from the outset. For example, we may require that x, and x, 
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involve only the factor Су, хз to x, the factors Су and бу, and so on. This procedure 
is equivalent to putting certain factor loadings equal to zero a priori and is said to 
impose a structure on the system. 


43.35 It is evident that in such cases estimational problems become even more 
severe than in the standard case of 43.29, especially if we permit factors which are 
themselves correlated. We shall not enter into a discussion of these topics, which 
indeed have scarcely reached a stage of development in which a critical review of theoreti- 
cal points is possible. Once again the electronic computer has come to the aid of 
psychologists by enabling them to specify sundry criteria to determine rotations or 
structural simplification and to solve the resulting equations, but even the computer 
may find it hard to provide accurate information about the sampling distributions of 
the resulting estimators. 


43.36 A word of warning may be desirable against attempts at component or 
factor analysis of matrices which are not obtained by product-moment methods. For 
instance, the elements of a correlation matrix may be estimated by tetrachoric or biserial 
coefficients—cf. 26.27-33, Vol. 2. If they are, the matrix is not necessarily positive 
definite, and in certain cases some of the latent roots may turn out to be negative. 


EXERCISES 
43.1 A p-variate complex has the following correlation matrix: 
1 p р? ЕА | 
р 1 Р us: АР 
pr gri ge los 1 


Show that the determinant of the matrix із (1—p*)?-* and hence that the complex cannot be 
represented in fewer than p dimensions. 


43.2 Show that if р>0 the complex of Example (not Exercise) 43.1 has one greatest 
latent root and that all the others are equal. Verify that the sum of the latent roots is p. 


43.3 The correlation between variables j and k in a p-variate complex is 1-|j-k|/p. 
Show that the complex cannot be represented in fewer dimensions. For the case p = 4 show 
that the latent roots are (2--V/2)/4 and (63- 26)/4. 


43.4 Show that if the latent roots of a dispersion matrix A are typified by А, those of А? 
are 22. Show that for large А the matrix А? tends to have diagonals which are Aj times the 
squares of the values of the latent vector 54, 7, being the largest latent root of A. 


43.5 In the notation of 43.20, if 
A = |ушв| В = |у | 
|O Ya р = |7 Ya 
Yaa Yab Yaa Yab 
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show that the vector correlation coefficient K defined by 
K? = (—1)°C/(AB) 
and the square of the vector alienation coefficient Z defined by 
2 = D/(AB) 
are invariant under linear transformations of the variables. Show also that 


K= x Il р; 
pari 
Z= X (1-0) 
j=l 


where the p’s are canonical correlations. 
(Hotelling, 1936) 


43.6 In the notation of the previous exercise, k and z being the sample values of K and Z, 
show that if the population canonical correlations are all distinct, 


p» = | 
var = 1 qs $ C 
п je P 
2 
mrxretz is 
x e 


РА 
cov (ks) = — KZ 5 1-0). 
п 3-1 
In particular, when р = 2, 


var k = LT K9!- д1 +К®)), 
var Z = *Za- zik» 


cov (№, z) = 2 К2(1+2-К?). 
(Hotelling, 1936) 


43.7 In the previous exercise, with p = q = 2, show that, in standard measure, 
ja 113724 — 710 7з 
{0 = ri) (1—73) } 


and hence derive a test of the hypothesis that the “ tetrad difference " ү; rs, —734 733 is zero. 
(Hotelling, 1936) 


43.8 In the notation of Exercise 43.6, show that 
_ P геа+а+1 Га т-а+28- 0) TG0—) 
BEA = П гаа) Гана а) 


(Girshick, 1939) 


43.9 If the latent roots of a multinormal dispersion matrix аге all equal, say, to unity, show 
that a randomly selected one of the sample roots has mean unity and variance (p+1)/n. 
(Girshick, 1939) 
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43.10 Show that the distribution of the tetrad difference of Exercise 43.7, denoted by u, 
in samples from uncorrelated parents, is given by 
_ 36-2) T(n) (* fi (tv — uy"? dt до 
aT? ((n-1) Ju Jup (0170) 702) 


du. 
(Girshick, 1939) 
43.11 In the notation of 43.29 show that 
H-IZ-'(c-2)Z-1 
is equal to J? and hence is diagonal. 
(Lawley and Maxwell, 1963) 


43.12 If the “error ” variances in a factor analysis are at choice, show that they can be 
chosen so as to reduce the number of factors required to m if 


b2Kp—m)(p-m-1). 


43.13 Consider a factor analysis with p = 2, m = 1. Write down the likelihood function 
and show by differentiation that the ML equations are 


€ 6; 
—— Th 'һ 0 


©з ( ~a) Cal, = 0. 


сай = са 


o/h, = 9/1. 
"This is an inadmissible result for the free estimation of the four parameters. Explain the reason 
for its appearance. 


Hence that 


and thus that 


43.14 Verify the value for the determinant (43.74) given at (43.75). 


CHAPTER 44 
DISCRIMINATION AND CLASSIFICATION 


44.1 In this chapter we shall be concerned with problems of differentiating 
between two or more populations on the basis of multivariate measurements. There 
are three distinct classes of problem which are often confused: 


(a) Discrimination. We are given the existence of two or more populations and a 
sample of individuals from each. The problem is to set up a rule, based on measure- 
ments from these individuals, which will enable us to allot some new individual 
to the correct population when we do not know from which it emanates. 

(b) Classification. We are given a sample of individuals, or the whole population, and 
the problem is to classify them into groups which shall be as distinct as possible. 
In discrimination the existence of the groups is given; in classification it is a matter 
to be determined. 

(c) Dissection. We are given a sample or population and wish to divide it into groups, 
whether the border-lines of subdivision are natural or not. 


For example, given a set of individuals from two different races, we may wish to set 
up a function which will enable us to allocate any freshly observed individual to the 
correct race. ‘This is a problem of discrimination. Or, given a population of unknown 
origins, we may wish to see whether they fall into natural classes, natural in this sense 
meaning that the members in a group are close together in resemblance, but that the 
members of one group differ considerably from those of another. This is a problem 
of classification. Finally, given a set of students with observed performances at an 
examination, we may wish to divide their standard of success into firsts, seconds and 
thirds, and the points where we effect this division are entirely arbitrary. This is a 
problem of dissection, and presents itself even where the population is homogeneous. 

In this chapter we shall discuss discrimination and classification, but not dissection. 


Discrimination 

44.2 Before beginning the theory of the subject, it is worth considering whether 
the problem as we have described it makes practical sense. We are given a set of 
individuals each of which is known with certainty to belong to population A or popula- 
tion B. If we can acquire this knowledge with certainty for such a group, why not 
for any new individuals which we may meet? There are at least four types of case 
providing an answer to this question. 


(a) Lost information. We may require to be able to assign to the correct sex a number 
of human bones dug up on an archaeological site. While the beings were alive 
there would have been no problem, but the essential information has crumbled 
into dust. 


(b) Unattainable information. A sample of hospital records may provide us with data 
314 
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concerning external symptoms and the existence of internal disease. Our problem 
is to diagnose the disease from external symptoms without hospitalization; and 
indeed one of the main objects may be to diagnose and treat at an early stage, so that 
internal examination is avoided. 

(c) Prediction. It may have been found from past experience that we can discriminate 
between certain types of behaviour, for example of economic systems, on the basis 
of observations made at a previous point of time. We rely on observations at the 
present point of time in order to predict the behaviour in the future. 

(d) Testing to destruction. When a test of an object involves its complete destruction, 
it is desirable to find discriminators of a non-destructive kind to predict the result 
of the test. 


44.3 It is important to note that we shall, in the first case, consider the allocation 
of an individual to one of two classes without provision for suspended judgement. 
"That is to say, it is mandatory to assign to one of the classes, even at the risk of error. 
When we make the assignment we may commit two kinds of mistake, according to the 
population to which we wrongly allocate a given member; and we shall assume in 
the first instance that the two types are equally important. 

Consider then a space W of p dimensions in which a sample member is represented 
by a point whose co-ordinates are the x-values. ‘The two populations may be imagined 
as two clusters of points (or continuous densities) which are separate (for otherwise 
they would be indistinguishable by means of x-values alone) but to some extent over- 
lapping (for otherwise there would be no problem of discrimination). We wish to set 
up a boundary in the space such that as many as possible of population 1 lie on one 
side and as many as possible of population 2 on the other. And we require the boundary 
to have a fairly simple shape. If f, and f, represent the respective frequency functions, 
we require our boundary to determine a region R such that 


[ла = [ола t 1- [ла (44.1) 
'This is equivalent to 
RC = 1. (44.2) 


This condition means, in effect, that the probabilities of misallocation are the same for 
the two kinds of error. We further wish to minimize the total error, which is equivalent 
to minimizing one of the types of error, say 


\ fadx = minimum. (44.3) 
The problem is then to find an ана minimum of 

NEUE (4) 
or, equivalently, of IK fa- fi) dx, (44.5) 


the constants А or f being determined from (44.2). This is clearly achieved by taking 
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into R all those points, and only those points, for which ff,— f, «0. The boundary of 
R is therefore given by 
Л/ = B, (446) 


that is to say, by a ratio of likelihoods. 


44.4 This is evidently a reasonable criterion. We allocate a member to one 
population or the other according to its “ nearness ”; except that “ nearness” is not 
a metrical distance but a nearness in probability. The probability of misclassification 
for either type of error is given by 


| Лак = J f. dx. (44.7) 
ҺЛ>В fifi iB 
We have, in fact, re-proved the Neyman-Pearson lemma of 22.10, Vol. 2. 


44.5 Now suppose that the two populations are multivariate normal with means 
ш and p, and identical dispersion matrices y. Apart from constants, the logarithm 
of the ratio of likelihoods is then, with T' inverse to Y 
к m Гу (у Has) (е а) — (55 — изу) (0 — Hax)} 
ES 2 Гук (ui; — sj) — ix , Fn (Haj ae Haj Hax). (44.8) 


The second part of this ы is a constant, "sd EO loss of generality we can 
take our boundary to be determined by 
EA Tg, (изу — nij), = constant. (44.9) 


This is a parental form. i we are given a sample with means %,, 3, and pooled dis- 
persions су, the sample boundary function is 
У Сук (8j— 5), = constant. (44.10) 


44.6 The same result may be reached by a different route. Suppose that we 
determine a linear function 


X= ® (44.11) 
jei 
so as to maximize the ratio of between-class to within-class variances, namely 
EG (uy— Hay) } 
44.12 
Dh he Ye ( ) 


A differentiation with respect to l; gives us 
p 
(£ Tus; =) A АЛ 
Ay =i Y llyn E73 


from which we have 
а «Ги, (44.13) 


leading back to (44.9). Since our fast X is used only to separate the two popula- 
tions, not to measure the distance between them, we may multiply it by any convenient 
constant, 
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44.7 As we might expect from symmetry, the discriminating hyperplane (44.9) 
bisects the line joining the two “centres” whose co-ordinates are pu апа ш. To 
see this, make an orthogonal transformation of the variables to a set which are inde- 
pendent and have unit variance. Since f, and f, have the same dispersion matrix, 
the same transformation reduces each to the required form. The discriminating 
hyperplane becomes 

E (u,5— sj); = constant, 


which is perpendicular to the line joining the centres, whose direction cosines are 
proportional to ш, —gs. ‘The distributions f, and f, are at the same time transformed 
to spherically symmetric functions, and without loss of generality we may take one 
co-ordinate axis along the line of means. The integrals giving the errors of classification 
then reduce to univariate normal integrals and are clearly equal when the discriminating 
boundary bisects the line of means. 

This determines the constant in (44.10). If X, is the mean of the left-hand side 
with respect to f,, and X, is that with respect to fa, the constant is halfway between them, 
ie. is equal to }(X,+X,). Without losing generality, we henceforth assume that 
Dew. 


Example 44.1 (Fisher, 1936) 

Table 44.1 gives the measurements in centimetres of four variables on 50 flowers 
from each of three varieties of Iris, namely setosa (S), versicolor (Ve) and virginica (Vi). 
Consider the discrimination of S from Ve. The variables are: 


x, = sepal length 
x, = sepal width 
хз = petal length 


x, = petal width. 


The means were (in centimetres): 


Variate | Versicolor Setosa | Difference 
Xi 5:936 5-006 | 0-930 
хз | 2:770 3-428 | — 0:658 
ху | 4260 1-462 2-798 
ха | 1:326 0:246 | 1:080 (44.14) 


'The pooled sums of squares and products about the means were (in cm?): 


xı Xs 


| xa | РЯ 
РА 19-1434 9-0356 9-7634 3:2394 
Xs 11-8658 | 4-6232 2:4746 
Xs | 12-2978 3-8794 
| 2-4604 (44.15) 
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Table 44.1—Multiple measurements in taxono; 
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The inverse matrix is, in cm-?: 


X; Xa хз ЕЛ 
x, | 0-118,7161 | —0-066,8666 | —0-081,6158 0-039,6350 
Xs 0-145,2736 0-033,4101 | —0-110,7529 
хз 0-219,3614 | —0-272,0206 
ЕЛ 0-894,5506 (44.16) 


Questions of degrees of freedom sometimes arise in this class of work, but since 
the object of the discriminant function is to separate, it can absorb an arbitrary constant 
in the coefficients. We shall take the total sample number 100 as the number of d.fr. 
in (44.15). The values in (44.16) are then to be multiplied by 100 to get the inverse 
of the dispersion matrix. 

Using (44.10) we then find for the coefficients 

l, = (118-7161) (0-930) — (66-8666) ( — 0-658) — (81-6158) (2-798) + (39-6350) (1:080) 

= —3-115,11. 

1, = —18-390,75 

l, = 22:210,44 

i, = 31473,74. (44.17) 

We may multiply these coefficients by any convenient constant. Taking the co- 
efficient /, to be unity would, for example, give us 

X = хү+5°9037х„,— 7-1299ху— 10-1036x,. (44.18) 

The mean value of X for versicolor, obtained by substituting means in (44.18), is 
66-917, That for setosa is —38-424. "The mid-point is 14-247. Thus for any value 
of X above 14-247 we assign to versicolor; in the converse case to setosa. 


44.8 We may calculate approximately the probability of misclassification. We 
have 
var X = They, 
= X ly Vyr Ves (Iam — Han) 


= lupa) 
which is estimated using (44.11) by 


var X = X,- X. (44.19) 

This is the estimated variance of a single value of X. If we take our critical value 

of X to be 4(X,+-¥,) the probability of misclassification either way is (approximately) 

the probability of exceeding a normal deviation of }(X,+X,)—X, = 1(X,— X) from 
a mean of zero with variance (X, — X;). 


Example 44.2 
In the data of Example 44.1, 
X, = 66917, X, = —38-424. 
Hence the error of misclassification is the probability of a deviation of 
1(66-917 + 38-424) = 52:67 
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or more from the mean of a normal distribution with variance 105.341. This is equal 
to the probability of a deviation of 52°67 with standard deviation 

(105-341)! = 10-26 
and is negligible. 


44.9 An interesting special case occurs when all the correlations between the x’s 
are equal. It is not uncommon in biological work for the correlations to be more or 
less the same in magnitude. If so, we can discriminate on two factors, size and shape, 
as follows. 

As in Example 43.1, it may be shown that if the correlations are all equal to p the 
latent roots of the correlation matrix are given by 

А = 1+(p—-1)p (44.20) 
ъ=... = 4, = 1-р, (44.21) 

The variation therefore contains one major component, the rest being isotropic. The 

component corresponding to A, is 


12 
= — Ў х. 44.22 
ü vb АР ( ) 
We take a “size” component proportional to this and write 
Q = Хх = VP (44.23) 
so that 
var Q = ph = p{l+(p—I)p}. (44.24) 


Among the remaining components no one stands out in advance of the others, 
Let us then take a set of weights w; with non-zero mean and define a “ shape " com- 
ponent by 


_ AL] 
pit. (44.25) 
We find that 
EN 
xDD 22°) {к= (44.26) 
Further, x 
cov (Q, P) = cov (z Sp zs) 
- x =— var xj4- Ол n cov (xy, xy) 
= й+(р-1} 2979 
=O: (44.27) 


The size and shape components are then uncorrelated. 


44.10 To arrive at a discriminator we take 
[LEE (44.28) 
and look for a discriminator of form 
X — aQ4P, (44.29) 
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so as to maximize 
(X,—X,)?/var X. 
Writing Dp = P,—P,, Dg = О,— О, we have then to maximize 
(«Dg+Dp)? 


a? vat Q+ 2a cov (Q, P) + var P’ (30) 
Using (44.27), we find easily the solution 
_ De var P 
RUE (44.31) 
Substituting now the T from (44.28) in the expressions for Dp and Dg we have 
Cya) (us 
=p FE (5; — 8) (44.32) 
= (1p) 5 #2)? (sog 
var Р = (1—p)= ба) (%,— #2) (44.33) 
Dg = p(5,— 5) (44.34) 
var О = p{l+(p—1)p}. (44.35) 
Substitution in (44.31) then gives « ye we find 
Х= p: 3 
TE T pe Ж (44.36) 


Example 44.3 
Consider again the Iris data of Examples н. 1 and 44.2. The correlation matrix is: 
| 


E | Xs | Xs Г х 
| 
- — zin = 
Xi 1 | *599,513 | -636,323 | 472,011 
Xs nte | 382,719 | -457,988 
Xs | t | -705,258 
x |1. (44.37) 


| | 
| | 


The correlations are near enough to equality to justify the use of the foregoing as 
an approximation. We reduce the variables to zero means з: unit variances to give 


Ve 8 Ve- S 
ж | 10628 | -10628 | 21256 
x, | —09551 | 09551 | —1-9102 
x, | 39894 | —39894 | 79788 
m | 3406 | -34426 | 68852 
Sum = Q| 75397 | —75397 | 150794 = ро (44.38) 


The variance of О is calculated as the sum of the 16 elements in (44.37) and is 
given by 
var Q — 10-5076. (44.39) 
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The weightings for shape are found from (44.38) by dividing the last column by 
1(15:0794) and subtracting unity, namely are 


—0-4362, — 1-5067, 1-1165, 0-8264. (44.40) 
For the estimate of P we then find 
P = (—0:4362 x 1-0628)+... = 8-2747 (44.41) 


and again by using (44.38) 
var P = 3-0912. (44.42) 
The covariance of О and P does not vanish, and we calculate it from (44.38) as 
0:36162. Substitution in (44.31) then gives « = 0:2412 and our discriminator is 


X = 024120 +P, (44.43) 
where, it must be remembered, О is the sum of the x’s in standard measure and P is 
the sum weighted by the numbers at (44.40). 


Quadratic discriminators 
44.11 The linear discriminator (44.10) depends on the assumption that the two 
populations under comparison have the same dispersions. If this is not so our log 
likelihood becomes, in an obvious notation, and ignoring constants, 
E Ty (ау May) (ue Hax) 7 E. Гад (oj — Hos) (Хк — Hor). (44.44) 
The quadratic terms in x no longer cancel, and our boundary becomes a quadric in p 
dimensions. This is, in general, an awkward construct to handle, which probably 
accounts for the fact that quadratic discriminators have not come into general use. 
We can make some progress if we reduce the situation to one of size and shape. 
We now have p = 2 and the covariance terms vanish. Expression (44.44) then reduces 
to a form of type 
(у—›)#+(х—›,)® (4445) 
where у and x are linear functions of P and О and the variances are calculable. The 
discriminating boundary then becomes an ellipse (in two dimensions) and the situation 
is tractable. Reference may be made to C. A. B. Smith (1947) for an example. 
C. R. Rao (1966) considers discrimination between composite hypotheses. 


Testing of a discriminant function 

44.12 The process of testing a discriminator needs a little clarification. We 
may suspect that there is a real difference between the populations but that they are 
so close together that a discriminator is not very effective; this is measured by the errors 
of misclassification which, though minimal, may still be large. Or we may think 
that there is a larger difference between the populations, but our sample size is not 
large enough to produce a very reliable discriminator; this is really a matter of setting 
confidence intervals to the function or its coefficients. Or we may fear that the parents 
are identical and that a discriminant function is illusory. 

Tests of discriminant functions have usually been discussed in terms of the last 
of these possibilities. They are not so much tests of the functions as tests of homo- 
geneity by the use of the function. If heterogeneity is found, the function, ipso facto, 
is significant in the sense that it discriminates between real differences in an optimal 
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way (except that we use estimators of dispersions and means instead of the unknown 
parent values). But that way may not be very good even if it is the best available. 


44.13 Suppose our two populations have, in fact, identical means. The difference 
of the means in the discriminator is then U, say, where 
U = X Cj (3, — Ëy) Ga ы): (44.46) 


The term #;— 8%, is the difference of two means, each normally distributed, and is 
therefore distributed like a mean about zero with twice the variance of a single mean 
if the sample sizes are the same. It follows—cf. 41.17—that U is distributed as 
Hotelling’s T?/(2n—1), based on 2n observations. This is equivalent to the distri- 
bution of the multiple correlation Ё? when R* = 0, by (41.84), and a test can be carried 
out by an analysis of variance. It seems preferable, however, to test homogeneity in 
the manner of Chapter 42, which enables us to consider differences in means and dis- 
persions separately. 


44.14 Since we observed in 27.28-9 that the null distribution of A* does not 
require the multinormality assumption, it is no surprise that discrimination with two 
populations does not require it either. Exercise 44.10 shows how we may derive the 
boundary (44.9) from a LS analysis. 


44.15 Let us now extend the discussion to cases where the two types of error are 
not equally important. "There are two ways in which our previous results may require 
modification : T 
(a) It may be known that members from population 1 have a different chance from 

those of population 2 of being chosen. For example, in selecting a batch of 

individuals at random to see if they have active tuberculosis, we expect to find many 
times more healthy than unhealthy patients. 

(b) The consequences of misallocation may be seriously different. It is less dangerous 
to diagnose a healthy person as unhealthy (because the mistake is likely to be dis- 
covered before serious harm is done) than an unhealthy person as healthy (where 
the reverse may be true). 

Let us suppose that the probabilities of emergence of members from our two 
populations are л; and л, (= 1—z,). Let us suppose further that we can attach 
numerical weights to mistakes, a misclassification costing us сү and c, units respectively. 
Instead of now minimizing mistakes in number we minimize cost. Then instead of 
(44.3) we have to minimize 


af алена | mda ат | amc) de (47) 
This is minimized when the boundary is determined by 
CaTa fa 
A21 44.48 
anh ( ) 


Thus if we work with log f;/f, as discriminator the effect of introducing the prior 
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probabilities л and the loss constants c is merely to add a constant to the discriminating 
function, or equivalently, to displace its critical value by log (c,72)/(c,7). 


The case of k populations 

44.16 When we proceed from discrimination between two populations to dis- 
crimination among a number of populations an essentially new point appears. As 
before, we shall endeavour to divide up the sample space into mutually exclusive 
regions, one for each population, and allot an observed member to the population in 
whose region it falls. But the boundaries of the regions are no longer determined by 
one single discriminant function. Either we must, to achieve optimal properties, have 
several functions, or, if we must have a single function, we shall have to sacrifice some 
discriminatory power. 


44.17 It will be enough for expository purposes if we consider three populations 
—the generalization to k is immediate. We will also generalize to the extent of sup- 
posing that the probabilities of occurrence of the three populations whose density 
functions аге fı, fo, fa are respectively лу, t, лз (лү+л„+ л; = 1). If the correspond- 
ing regions are R,, Ra, R, a generalization by С. К. Rao of the Neyman-Pearson lemma 
states that the errors of misclassification are a minimum if the regions are determined 
by probability ratios which form a simple extension of (44.15). In fact, R, is such 
that л, f, is greater than or equal to both л, f, and л; fa; R, is such аёл, f,>75 fa 
and л, fı; R, is such that лз f,>2, f, and л, fy. 


44.18 In particular, if the three populations are norma! with common dispersion 
matrix Yj, and means Hij, Haj изу, it follows as in the manner of 44.5 that R, must be 
such that 

У Гу. (изу — 125) > Вуз» Say, (44.49) 

E Vy (m; — Hay) 2 буз, say. (44.50) 
Similarly for the other regions. In the sample, R, will be determined as the domain 
lying between the two hyperplanes (44.49) and (44.50) and including the mean of 
population 1; and so on. The surfaces of constant weighted probability ratio for 
populations 1 and 2 are, in fact, given by 


log тл = DD ye (Mag – из) 3X Гук (изу Mare aj Han) + log 73/75. (44.51) 
2 
In the particular case where all the z’s are equal we may compare the three functions 
X, = ETggux 739 Vip ay Hik (44.52) 
Xa = TD ype Maj He — FE Tiko Max (44.53) 
Xa = VD ip Maj y 3E Гу, Haj Hoto (44.54) 


and allot a member to R,, Ry, R, according to which of the X’s is the greatest when 
the sample values are substituted. For if, say, X, is the greatest, it follows from (44.51) 
that >)» and fı>fs As usual we may substitute sample values for the unknown 
parameters in these equations to get an approximate discriminator. 


DISCRIMINATION AND CLASSIFICATION 325 


Example 44.4 (C. К. Rao and Slater, 1949) 


A number of persons falling into certain neurotic groups obtained the following 
mean scores in three tests: 


Grup Semple A Mean Score 
Anxiety state 114 2-9298 1-1667 0:7281 
Hysteria 33 3-0303 12424 0:5455 
Psychopathy 32 3-8125 1:8438 0:8125 
Obsession 17 | 47059 1:5882 11176 
Personality change 5 | 14000 0-2000 0-0000 
Normal 55: | 


256 (44.55) 


'The dispersion matrix within groups (250 d.fr.) was 


1 2 | 3 


| 
1 2:300,851 | 0:251,578 | 0-474,169 
2 0:607,466 | 0-035,774 
3 0:595,094 (44.56) 
Its inverse is 
| 3 | es | 3 
| 
эе p г 
1 | 0-543,234 | —0200,195 | —0420,813 
2 | 1-725,807 0-055,767 
3 | 2:012,357 (44.57) 


For the purposes of this example we will suppose all the z's to be equal. The six 
discriminating functions of type (44.49) are then as follows: 


| Coefficients 

Xi Xs Xs Constant 
Normal 0:2050 0-1431 0:1947 —0-0931 
Personality change 0-7204 0:0649 —0:5780 —0:5107 
Anxiety state 1:0515 1:4676 0:2974 —2:5047 
Hysteria 1-1678 1:5679 —0:1081 —2-7139 
Psychopathy 1:3599 2:4641 0-1336 —4:9182 
Obsession 1-7680 18611 | 0:3573 —5-8375 (44.58) 


Here the coefficient of x, for the normal state is 
(0-543,234) (0-6000) — (0-200,195) (0-1455) + (—0-420,813) (0:2182) = 0-2050. 
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Suppose, for example, we had a subject with scores 1, 1, 0. The values of the 
functions, in the order of (44.58), are 0-2550, 0-2746, 0-0144, 0-0218, — 1-0942, — 2-2084. 
We assign the member to the second group, personality change. In practice, of course, 
we should do so very tentatively. The normal group is very close and there are only 
five members in the personality-change group on which the sample discriminators are 
based. 


44.19 From the geometrical viewpoint, the discriminating functions represent hyper- 
surfaces in p dimensions. Those of 44.18 are, of course, planes. As we have seen, 
they are not orthogonal to the lines joining the means of distributions, When we have 
more than two populations the means will not, in general, be collinear. We might, 
however, find the line of closest fit to the k means, and use variation in the direction 
of that line as a discriminator. And in fact, if we have k populations we may seek 
for a function X given by 

Р 
Х= E ху 
such that the ratio of variances between and within classes is maximized. It comes 
to the same thing to maximize the ratio between classes to total variance, as in 44.6. 
If A represents the dispersion matrix between classes and B the total, this is equivalent 
to maximizing 


д= E4nlh (44.59) 
> Big ly hy 
which leads to E(4,4—2Bgj)l, = 0. (44.60) 
k 


Thus the largest latent root of | А—2В | = 0 provides our discriminator. For details 
reference may be made to Bartlett (1951), E. J. Williams (1952), and Blackith (1960). 
It appears to us that, in general, the use of one function for discrimination among 
several populations may be rather Procrustean unless they are so separate that almost 
any method will yield reasonable results. 
Qualitative data 
44.20 Our discussion so far has been in terms of measured variables x. In 

practice we frequently have to deal with situations where some or all of the variables 
are qualitative. Let us consider the case where they are all qualitative. Suppose there 
are p of them and the jth variable is divided into зу, categories. In the given sample 
there will be, say, лу, ajy members in the kth category of the jth variate. If n,n, are 
the total sample members we allot a new member in that sub-class to population 1 or 
population 2 according as 

Лук > Majk 

eg (44.61) 
In short, only the proportions in the class (j, Ё) are relevant. All the other class fre- 
quencies tell us nothing about membership of that class. 


44.20 This seems a crude method of procedure, but it is in line with the criterion 
we adopted for measured variables; for equation (44.61) merely says that we allocate 
a new member to the class for which it has the greater probability of occurrence. 
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If the categories ғ, can be ordered іп А, that is to say if they follow а natural sequence 
Sjo Sja +» + » Sye (as for example in an ordered categorization), it might be possible to 
utilize information from cells outside the (j, k)th. So far as we know, this has not 
been attempted. 

If we are prepared to prescribe misclassification costs, a somewhat more sophisti- 
cated discriminator can be set up on the criterion that a number is to be allocated so 
as to minimize the cost of misclassification over the whole table. Cochran and Hopkins 
(1961) examine the procedure. See also Linhart (1959). 

Hills (1967) discusses a number of “ nearest-neighbour ” procedures and step-by- 
step methods. 


44.22 Perhaps the most troublesome case is the one in which some variables are 
measured and some qualitative. "There appears as yet to be no satisfactory theory 
to deal with this situation. A rather heuristic approach is to construct a score from 
the qualitative variables (e.g. by representing a dichotomy by 0, 1, a tritomy by 
—1, 0, 1, etc. and averaging over variables) and then to use that score as a measured 
variable in conjunction with the other measured variables. Alternatively, a separate 
discriminator can be constructed from the measured variables for each cell of the 
qualitative classification—a tedious procedure and one which is apt to reduce the sample 
numbers for each discriminator to a very low point of reliability. The subject would 
repay further study. 


44.23 Before proceeding to consider distribution-free methods we deal briefly with 
a few points not yet discussed: (a) reserved judgement, (b) bias in the estimation of 
misclassification errors, (c) discarding redundant variables. 


Reserved judgement 

44.24 In many, perhaps most, problems in discrimination it is wise to allow for 
reserved judgement on borderline cases, and not to insist on an allocation to one of 
two classes. This means, in geometrical terms, that we wish to divide the sample 
space into three regions R,, Ra and Б. If a member falls into R, we allocate it to 
population 1; if it falls into Rs, to population 2. If it falls into Dj; we admit that the 
data are insufficient to make a satisfactory judgement. This region, in general, will 
contain members of both populations fairly intimately mixed up together, and in practice 
we should probably seek for some other criterion to disentangle them. 

It is not difficult to use the linear discriminator to set up the region Dj. We 
merely have to decide on what misclassification probabilities are tolerable, define Кү 
and R, in terms of them, and assign D;, to the remainder of the sample space. 


44.25 With more than two populations the number of regions becomes more 
numerous. With three, for example, we may define regions of doubt Dis, Dos, Ры 
in terms of the three discriminants, but these will intersect. ‘Thus we may have a 
region D,a, wherein we cannot allocate to any population; a region Di; ; where we can 
reject №, but cannot allocate as between R, and Ry; and so оп. No particular difficulty 
arises, at least with linear discriminators, which divide the sample space into regions 
with flat boundaries. Problems of interpretation could arise with quadratic or cubic 


forms. 
Y 
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Bias in the estimation of misclassification errors 

44.26 The simplest way of estimating errors of misclassification is to apply the 
observed discriminator to each member of the sample on which it is based, and to 
observe the errors in that sample. If we were certain about the parent normality, 
equality of dispersions, and the accuracy of estimators of means and dispersions used 
in constructing the actual discriminator, we could estimate the errors theoretically as 
in Example 44.1. But if we are uncertain about the extent to which our discriminator 
is sensitive to these assumptions it is better to ascertain the errors in applying it to the 
observed sample. In fact, this is a procedure we should probably wish to follow in any 
case, as a precautionary check. It may, however, involve a small bias. 


44.27 There are, in practice, two sources of error in the empirical determination 
of the misclassification error. First of all, we do not know the parent parameters and 
our discriminant is based on estimates from the sample. On the average, our empirical 
estimate of error will be greater than the true value. Secondly, our empirical estimate 
is derived from data to which it has been fitted. Consequently the empirical estimate 
will, on the average, be less than it would have been had the discriminator been applied 
to a new sample; but this itself, as we have seen, would be greater than the true value. 


Example 44.5 (Cochran and Hopkins, 1961) 

The following simple example will exhibit the effect. 

Suppose we have two populations P, and P, and a single variate which can take 
values а; and а,. Let the true probabilities that a member in P, has the appropriate 
values be л, (ау) and лу (a) = 1—7; (a), and similarly for zt; (a,) апал, (аз). If a sample 
of n, from P, bears r, values of a, the unbiassed estimator of 7, (a;) is r,/n,; and so 
forth. 

The allocation rule will be to place a further observation a, in P, if the corresponding 
7,/n 7 r»/ns, and to place an observation a, in P, if 1—7,/m»1—r,/n,. 

Now consider the case when the true probabilities are given by 


P, P, 


а 01 095 


Suppose we are given a random sample of one from each population. (This is very 
trivial but will suffice to make the point.) If the one in P, has value a, and that in 
P, has value a;, the rule for the future is that every observation with a, is to be allocated 
to P,, and every one with a, is to be allocated to P,. The estimated misclassification 
probability is zero. The actual (but unobserved) probability is }(0-1+0-05) = 0-075. 
Likewise if the value from P, is а, and that from Р, is a, the decision rule is reversed. 
Again the estimated misclassification probability is zero. Actually it is 4(0:9--0:95) 
= 0-925. 

If both members bear the value a, we should either reserve judgement or toss up 
for the allocation. In the latter case the estimated and actual probabilities of mis- 
classification are both 0-5. Similarly if both members exhibit а,. 
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Since we have supposed that the two members we have were chosen at random 
from their respective populations, we can average these probabilities. The results are: 


Prob. of misclassification 


Occurrence Prob. of occurrence Estimated dicti] 
Р (а!) Ру(а;) (9)095) = 0:855 | 0-0 0:075 
Р (а!) Руа) (-9)(-05) = 0-045 0-5 0-500 
Р\(а;) Руа») (-1)(-95) = 0:095 | 0-5 0:500 
Раз) Руа) | (10005) = 0005 | 00 0:925 


| 0:07 0:13875 


'This, of course, is a very extreme case. In sample sizes likely to be worth dis- 
cussing in practice the bias is much smaller. 

Further discussion is given by Cochran and Hopkins (1961). See also John (1961). 
For a much more comprehensive discussion of error rates see Hills (1966). If, of 
course, the initial sample on which we base our discriminator was not chosen at random, 
no quantitative estimate of bias, in general, is possible. 


Redundant variables: standard errors 

44.28 It is natural to enquire whether all the variables x which appear in our 
discriminator are necessary. One expects that discarding variables will weaken the 
discriminatory power, but the loss may be negligible. Looked at from the geometrical 
viewpoint, if our constellation of points in a p-dimensional space is satisfactorily divided 
into two by the discriminating hyperplane, the same may be true if we project on to 
one of the co-ordinate hyperplanes, in which case the variable orthogonal to that plane 
is redundant. 

'There are several ways of approaching this problem. It would save a good deal 
of trouble if we could discard unrewarding variables at the outset without bringing 
them into the analysis. This, however, is a hazardous operation. Some results of 
Cochran (1964) for the normal case are given in Exercises 44.6-9. J. D. Elashoff et al. 
(1967) have shown that the simple rules emerging from these exercises fail to hold, 
even when choosing the best two of p dichotomous variables. A more direct approach 
would be to estimate the misallocation errors by omitting certain variables, but this is 
apt to be tedious if the number of variables is large. We will make a third approach 
by deriving the standard error (in large samples) of the coefficients in the linear dis- 
criminant. 

The coefficient J, is given by 7, = > Cy, (3j #3). 


ј 
Непсе dl, = 2 (5,5 — 83)4Cj. + Су,а(®\;— &)). (44.62) 
Likewise dl, = X {(®„— ®„)4С „+ Com 49, — Far). (44.63) 
r 
Hence 


4,4, = z {(®у— 5) (Eir — Fay) ACh, dC rm + (335 — Fag) Com dC jr а(®„— Far) 
7 (y, — 8) Cj dC pm 4(®— 8) + Cjr Crm Hag 8) а Fap)}- (44-64) 
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Remembering that means are independent of dispersions for normal variation, we have 
соу (lys lm) = x ëy — Ea) (3, — Far) COV (Cj С) 
+ Cir Crm COV ((8— Faj) (y. 5)}]. (44.65) 
If the two samples are based on n, n, observations we easily find 
соу ((3,— Faj) (Х„— Xy)) = cov (y, Fir) + cov (5s, Žar) 


1 | 
= (2+ x) £j. (44.66) 


We now require the covariance of Су, and Cpm- 

Let us write temporarily Гу, for the co-factor of cj, in | c| so that Cj, = Гу/|с|. 
Then 
Ts a 


dCi = “Teh рер itt 
2c У T, d, piis 2 T. aa de, 
Геја o Int pop E me Moon 
where Гул is the co-factor of сы in Te 
"up У (7 Tg ar € | Tg up] deag (44.67) 


Now in virtue of Jacobi's SE: on determinants 
| | jap = Га Tag Гу Ta 
Hence (44.67) reduces to 


“С EH E Ty Tis dag 
= — ® Сьбыйы. (44.68) 
We now find, on using (41.98), ge 
соу (Cj, Crm) = z * (Cim Cis Cir Cim). (44.69) 
1 2 


Substituting from (44.66) and (44.69) in (44.65), we have 
cov (lps lm) = бы: > Cy Crm ir 


2 : (By — 3j) (Eir — Far) (Cim Crr + Cin Crm) 


nns j 
(аа заа SEa) С Ci 
-(z ^ m LER 1j — Faj) (Fir — Far) Chm Chr 
+1 ul Com 5 (315 — aj) (1. — Far) Cjr Mes) 


Ld 1 - 
„(јана -Inlet s Oen È (e) 


id 1 1 Е 
- ( + _) Cz sant ga im Zi X) (44.71) 
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In particular, with k = m, and both replaced by j, 


meal ec B5 1 
ME (2+4 уу 


(X-X) Cy. (44.72) 


Example 44.6 
Consider again the Iris data of Example 44.1 and let us test л. We have 
m = fi = 50, 1, = —3115,11, 
XQ,- X, = 105341, С, = 11-871,61. 
We find from (44.72) 
varl, = 0:4749--0-0970--12:5057 = 13-0776, ^ s.e(h) = 3:62. 


The absolute value is less than the standard error, and we should consider whether x, 
can be discarded without serious loss of discriminating power. 
In point of fact, as we shall see later, a good discriminator can be based on x, alone. 
Cochran and Bliss (1948) considered the case where the effect of some variables is 
abstracted by a covariance technique, and discrimination applied to the remainder. For 


the method and a worked example, reference may be made to their paper. 
For the use of the D? statistic and discrimination generally see С. R. Rao (1952). 


Distribution-free methods 

44.29 We proceed to discuss the possibility of distribution-free methods of dis- 
crimination for Ё populations. Very little work has been done on this subject, and the 
following sections 44.29-33 should be regarded as suggestions which need to stand 
the test of experience. 

Let us revert to the representation of members as points in a p-dimensional space 


Fig. 44.1 (sce text) 


whose co-ordinates are the values of the variables ху, х, ..., x,. Confining ourselves 
for the present to two populations, we may think of one population (say А) as represented 
by crosses and the other (say B) by circles. In two dimensions the picture might 
look like Fig. 44.1. The crosses have a convex hull which we have drawn in; likewise 
for the circles. In general these two will have a common domain. 
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Let us consider the following rule of discrimination: 


(a) If a point falls in the A-hull but not in the B-hull we assign it to A; 
(b) If the point falls in the B-hull but not in the A-hull we assign it to B; 
(c) If the point falls into both hulls we will not assign it to either. 


The proposal is plausible but we shall not follow it up for three reasons: 


(i) The determination of the convex hulls is a problem in linear programming which 
is soluble but takes us outside our present scope; 
(ii) The method gives no guide to the treatment of new points which fall outside both 
hulls; 
(iii) The method is not truly distribution-free, because non-linear variate transforma- 
tions do not preserve the planarity of the hull boundaries. 


A count of points in the two hulls and their common part is nevertheless useful as 
giving us a measure of the degree of entanglement of the two populations—a measure, 
so to speak, of the magnitude of the discrimination problem. 


44.30 As a prelude to a distribution-free method, consider again Table 44.1, data 
for setosa and versicolor. 

The petal width of setosa has a mean value of 0-246 and a range of 0-2-0-6 (variance 
0:0109). That of versicolor has a mean of 1-326 and a range of 1-0 to 1:8 (variance 
0-0383). On this showing, as we have already remarked, petal width would be a 
perfectly good discriminator in itself. If we allot a new member to setosa or versicolor 
according as petal width is less than or exceeds, say, 0-9, we shall rarely make a mistake 
even if the variates are normal. 


44.31 The method we propose may be illustrated on the discrimination of versi- 
color against virginica. A casual inspection of the data shows what can be confirmed 
by tabulation, that the two differ more on petal length PL and petal width PW than 
on sepal length or width. We form a frequency distribution for PL and PW as in 
Table 44.2. 

We observe that on PL the two distributions overlap in the range 4-5-5-1. Outside 
this range there are 29 cases of versicolor and 34 cases of virginica. On PW there is 
overlap in the range 1-4-1-8, 28 cases of versicolor and 34 of virginica lying outside it. 
The total of cases lying outside the common range being 63 for PL and 62 for PW, 
we shall take as our first discriminating variable PL. 

We then lay down the following rule of discrimination: 

PL «44 allot to versicolor 
PL252 allot to virginica 
4-5<PL<5-1 refer to next variable. (44.73) 

There are 37 cases for which PL lies in the common range 4-5-5-1. We take these 
cases out of Table 44.2 and construct a distribution for them in respect of PW, as in 
"Table 44.3. 
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Table 44.2—Frequency distributions of petal length and petal width 
for Iris versicolor and Tris virginica 


Petal hes Variate | Petal width 


Variate 
values Vers. values | Ves, Virg. 

| 
adis чү 25 
44 | 4 10 7 
45 7 m oce 3 
46 3 = us 5 
47 5 3 13 | 13 
A L2 2 14 7 1 
49 2 3 15 10 2 
50 1 3: «|. 4:6 3 1 
54 1 "E 1 1 
52 о |) a8 1 11 
$3 | 2 19 5 
5-4 2 2-0 6 
5-5 Br | 24 6 
56 6 22 3 
57 5 | 25 | 8 
58 3 24 3 
59 з | 25 | 3 

Жр = pes | I 
| so 50 50 50 


Table 44.3—Frequency distribution of 37 cases not distinguished by PL 


| Petal width 


Variate values | Vers. Virg 
7 
12 I3] 
13 2| 
14 4 | 
15 Ole 2; 
16 gue 
17 1 1 
18 lod zb 5 
19 3 
20 3 
2-1 | E 
22 | - 
23 | 1 
24 | 1 
21 16 


Proceeding as before, we see that there is a common range for PW of 1:5-1:8. We 
therefore add to the rule (44.73): 
45<PL<5-1 


PW < 1-4 allot to versicolor 
PW21-9 allot to virginica 
1:5 <РҰ < 1:8 proceed to next variable. (44.74) 
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This leaves 22 cases undecided. PW has discriminated 63 cases and PL a further 
15. We now refer to the 22 undecided cases on sepal length SL and sepal width SW. 


Table 44.4—Frequency distributions of 22 cases not distinguished by PL and PW 
Variae | — Sepallengih | Variate Sepal width 
values | Vers. Virg. | values Vers. Virg. 
49 1 22 1 1 
= = $ 23 2 e 
54 1 - 24 - - 
55 = = 2-5 1 1 
5:6 1 - 26 = - 
57 = = 27 1 1 
58 | - - 2-8 1 2 
59 1 1 29 1 = 
6:0 | 3 2 30 3 3 
БЧК eS 1 34 2 
62 | 1 1 32 2 
6:33 | 2 2 3:3 1 
6-4 1 - 34 1 
6-5 1 = 
6:6 - = 
67 2 - 
68 = = 
6:9 1 - 
14 8 14 8 


‘Table 44.5—Distribution of 16 cases not distinguished by PL, PW, SW 


Variate values vespa eric 
eri. ing. 
49 1 
54 1 = 
55 E E 
56 a - 
5-7 [= = 
58 | = Е 
5-9 | - 1 
6-0 2 2 
6-1 | - 1 
62 EE 1 
63 1 2 
64 = = 
6:5 es - 
66 = = 
67 1 = 
8 8 


For SL there are only 5 cases out of 14 lying outside the common range. For 
SW there are 6. We therefore take SW as our next discriminator and add to (44.74) 
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45«PL «51 
15<PW<18 
SW > 3-1 allot to versicolor 
SW «3-1 proceed to next variable. (44.75) 


Our third variable discriminates a further 6, making 84 altogether and leaving 16 
undecided. For these 16 the distribution on SL is given in Table 44.5. 
For what it is worth we may now add to (44.75) 
45<PL<5-1 
15<PW<18 
SW «34 
SL > 6-4 allot to versicolor 
SL «5.3 allot to virginica 
5-4<SL<63 undecided. (44.76) 


This leaves us with 87 cases decided and 13 undecided. No further discrimination 
is possible. 


44.32 The general method will now be clear. It is completely distribution-free, 
depending only on the rank order of the variate values. It brings up one by one the 
variables which are prima facie most important in the discrimination. It involves no 
arithmetic other than counting. 

On the other hand, the discrimination which results is not necessarily optimal. 
Looked at from the geometrical viewpoint, instead of a plane boundary as in Example 
44.1, we have a step-wise boundary. The discrimination on the first variable rules off 
three domains by hyperplanes orthogonal to that variable. ‘The second variable rules 
off similarly in the region of indecision left by the first; and so on. It is possible 
that an optimal method based on distributions may leave a smaller residuum of un- 
decided cases than the one we propose; but it can do so, of course, only at the expense 
of sacrificing the distribution-free nature of the procedure. 


Differences in dispersion 

44.33 We may add a final word on the problem of discrimination when populations 
differ in dispersion but not in means. It is easier to point to the problem than to 
suggest a solution. Consider, for example, Fig. 44.2, where the populations have the 
same mean but different dispersions. There are clearly areas where discrimination is 
possible, but the foregoing methods fail to reveal them. 

If the configuration was the same but the figure was rotated through 45 degrees, 
we should arrive at meaningful results by the rank order method. Тһе lines 
KK', LL’ would rule off domains outside of which crosses were, so to speak, dominant 
and likewise MM’, NN’ would define domains for the circles. The rectangle in the 
middle would be a zone of indecision, which would inevitably be large owing to the 
nature of the data. 

A heuristic procedure in such cases would be to rotate the axes (for measurable 
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Fig. 44.2 (see text) 


variables), say, by a transformation to principal components. Lubischew (1962) has 
discussed the problem in a biological context. See also Bartlett and Please (1963). 


44.34 One difficulty of the foregoing method stems from its sensitivity to outlying 
values. As we have explained it, only non-overlapping regions are accepted for dis- 
crimination; for variables of effectively infinite range there tends to be more overlap as 
sample size increases. It might, therefore, be preferable to accept some misclassifica- 
tion from the outset by permitting overlap up to a specified amount; or to fit univariate 
distributions and estimate the cut-off points to a specified degree of overlap. Much 
more remains to be done in this field. 


Classification 
44.35 The problem of classification, as we define the word, is one of determining 

from empirical evidence whether individuals “group” or “cluster.” There are 

two different ways of looking at this problem, corresponding to the two kinds of space 
in which we represent the data. 

(a) Given, as usual, а р xn vector of observations, let us consider the n sample points 
in the p-dimensional Euclidean space determined by the p variables. If these 
points, to some acceptable definition, fall into clearly distinguishable groups, 
we may say that the л individuals may be classified into those groups. Their 
“nearness ” is to be considered as a function of the variate values which they bear. 

(b) In the alternative p-space embedded in an n-space the variables are represented by 
vectors. There is some interest in how far these vectors cluster, as we have seen 
in canonical analysis. In this case we are concerned with the extent to which the 
variables cluster, not the individuals. 

It would be convenient, though it is not general practice, to refer to the first type as 
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classification analysis and the second as cluster analysis. In the first we accept the 
variables and try to classify individuals; in the second, which is perhaps logically anterior 
to the first, we are interested in the variables, to see, for example, whether they are all 
necessary and which are the more important for the purpose in hand. 


44.36 In either case our primary difficulty is to define what we mean by “ group ” 
or “cluster.” "There are several ways of doing so, but they all rest on the notion of 
“ nearness ” or * distance.” The consequence is that we have to set up some kind 
of metric to determine the distance between two points, and then decide on a distance 
within which two points are “ near.” 

For cluster analysis an obvious distance function of x; and x, is the correlation ру. 
We can regard this either as the cosine of the angle between the vectors or as the cosine 
of the distance between the end-points of the vectors on the unit hypersphere. The 
correlation matrix p then sets up our distance function. We have only to decide what 
values constitute nearness and how we use them to define a cluster. 


44.37 Suppose we decide that points with p>0-7 are near together. One manner 
of procedure is then as follows: scan the correlation matrix for pairs with correlation 
20-7. If there are none, no cluster exists. In the contrary case take one pair, say 
хх. Examine the correlations of other variables with these two. If there is an x, 
such that the average correlation (three values) between ху xj, x; is 207 add x, to the 
cluster. Proceed if possible to find a fourth such that the average p (6 values) 20:7; 
and so on until the process fails. The resulting vectors are a cluster. Putting these 
on one side, repeat the procedure with the remaining variables; and so on until the set 
is exhausted. 

The procedure is fairly easy to apply for a number of vectors of reasonable size— 
and in practice the number rarely exceeds 50. But it may not be unique in the sense 
that where we have a choice of starting pairs the ultimate result may depend on which 
we choose. If computational facilities were available, it might be possible to split 
the р vectors into groups in all possible ways, the number of non-unitary partitions of f, 
and examine the clustering within each partition. But this would probably overtax 
the capacity of the largest computer. 


For some further studies see Tryon (1939) and Fortier and Solomon (1966), The 
methods of cluster analysis have not been much used by statisticians and are worthy 
of further study, for example in the discarding of redundant variables in regression analysis, 
structure analysis, discriminant analysis and, indeed, in multivariate analysis generally. 
It must be remembered that correlation coefficients are quantities of a highly summary 
kind, and it is prudent, as a preliminary in all these cases, to draw some of the bivariate 
scatter diagrams in order to get an overall view of the nature of the variation. 

King (1967) gives two step-by-step methods of cluster formation (and empirical 
evidence that they work well, but not optimally), one of which minimizes (42.45) at each 
step at which variables are grouped. Н. P. Friedman and Rubin (1967) minimize (42.17) 
to determine the clusters. 


44.38 The method of cluster analysis by correlations has the advantage of being 
independent of the scale of measurement of any particular ху. We are, so to speak, 
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concerned with the number of components f, not the way in which any one is measured. 
Such a method is not distribution-free, but if any worry is felt about the non-normality 
of the data, the original variables can be replaced by ranks and the correlation procedure 
still applies. In fact, we can extend the method to cover qualitative data, provided that 
the categories in which they occur are orderable (see, for example, 33.36, Vol. 2). The 
metric, one might claim, is a natural one. 

But when we consider the grouping problem of n points in a p-space such con- 
siderations no longer apply. “ Distances” may be greatly affected by altering the 
scale of one of the variables and, indeed, can be assigned almost any values we like by 
stretching scales in the right way. Sometimes the difficulty can be overcome, or at 
least partially met, by an initial standardization. We prefer, however, a distribution- 
free method based on ranks. 


44.39 We shall now set up a distance function, not between variables but between 
individuals. "Thus, if the variate values of the jth and kth members аге хуу, Kajs ss» Mpg 
and xy, Хы... , Xy, We require a measure of correlation between them. To compute 
a correlation based on the values as they stand would be nugatory; for example, changing 
the sign of one vector variable would alter the value of product-moment correlations. 

We therefore replace the л values of any component х; by a set of ranks from 1 to n. 
"These ranks may be tied for any set of members exhibiting the same value of ху; and 
in particular qualitative data in ordered categories may be regarded as tied rankings, 
so that our method has a very general application. The рхи matrix of rank values 
typified by rą, « = 1, 2, . . . , p; Ё = 1,2, .. . , n replaces the original matrix. 

For each of the pairs of sample members we calculate the function, analogous to a 
chi-squared measure, 

Dy = &£ Cura. (44.77) 
a-1 Vat Ty 
The variance of a set of л ranks depends on the number of ties present. If there are 
ties of t,, t,,..., etc. members, we have 


Nur = (изп) X(t9— 12). (44.78) 


(Cf. Exercise 44.4.) Note that although the ranks are not invariant under reversal of 
scale, the measure (44.77) is. 


44.40 А practical difficulty, as in most classification procedures, arises from the 
number of pairs which can be chosen from л members, namely 4n(n—1). Thus, for 
a sample of 100 there are 4950 pairs, each with a value of Dy. То proceed in the 
manner of 44.37 and form groups by adding one member at a time is a sufficiently 
complicated exercise to require a computer; but it presents no theoretical problems. 


Example 44.7 


A heuristic procedure which gives at least a preliminary idea of the extent of the 
grouping may be illustrated on some of the data of Table 44.1 (Kendall, 1966). A 
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table was constructed from the figures for versicolor and virginica by regarding them 
as a single sample of 100 of unknown origin. The first two variables, sepal length 
and sepal width, were ranked in the ordinary way from 1 to 100. Petal length was 
split into four categories, values <4, >4 and <5, >5 and <6, and >6. Petal width 
was condensed into two categories <2 and >2—a very heavily tied ranking. The 
4950 values of Юу, were computed and used to sort the 100 members into classes. 

The data gave two well-defined classes, comprising 58 (A) and 25 (B) members, none 
of them overlapping. Of the 17 remaining there was another group round a further 
pair, but in fact this group comprised 30 members, 9 new ones from the 17, and 21 in 
common with A. It was therefore decided to amalgamate the 9 with A to form a 
group 4’ of 67 members. B remained with 25. The remaining 8 did not fall into a 
clearly defined class. 

"There was thus fairly clear evidence of two classes, and only two. But whereas B 
contained only versicolor, A’ contained 48 virginica and 19 versicolor. On this basis 
we should correctly arrive at the number of classes, but misclassify 19 and leave 8 
doubtful. In the analogous problem of discrimination we decided 87 cases and left 
13 doubtful. However, in the present example, we sacrificed a good deal of information 
by grouping petal length and petal width, so the results are not discordant. Reference 
may be made to Kendall (1966) for details. 


44.41 The subject is far from being exhausted. There are several ways of dividing 
members into groups, even when a suitable distance metric has been decided upon. 
For example, it is possible to consider a classification based on the intra-group distance 
in relation to the between-group distance. Wald (1944) proposed a statistic of this 
kind which is closely akin to the discriminant function. 

As a final comment, we would remark that, under the influence of the papers by 
Fisher (1936) and Wald (1944), statisticians have tended to approach the problems of 
discrimination and classification by looking for a single function of the variables. This 
appears to us to be a procedure which, in many circumstances, may be too restrictive. 
What is required is an allocation rule or set of rules; and this may or may not make 
use of a linear function, or a single function, of the variables. 


EXERCISES 


44,1 Taking x, and х, as sepal length and sepal width, respectively, for the data of Iris setosa 
and versicolor in Example 44.1, show that the linear discriminant function is 


x; —1:236 х, 
and that this is nearly as good as the four-variable discriminator of the example. 


44.2 Show that the discriminating boundary given by (44.9) may be written 
U = x V~ (p, — pa) — (ua +H)’ V7! (ш-р) = 0 
where V is the parental dispersion matrix. 
Show that if x is distributed аз N(u,,V), U has mean equal to 


dag)! V7 (а а) = is say, 
and variance х. 
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If x is distributed according to №(ы,, V), show that U has mean — іа and variance о. 
(Т. W. Anderson, 1958. The distribution when sample 
estimates are inserted for » and V is very complicated 
but asymptotically is the same. See Wald (1944), 
Sitgreaves (1952), T. W. Anderson (1951), and John 


(1961).) 
44.3 x, X11, Xav » » + » Хи, are drawn from population 1 which is N(t, У). xy, ха. . . 5 Xn 
are drawn from population 2 which is N(u, V). Consider this against the alternative hypothesis 
that xı. - . , Xn. are drawn from population 1 and x, x32, . . . , Xm, from population 2. Show 


that the likelihood ratio for testing the composite hypothesis is 


1+—_(«-,)’'V-1(x-,) 
nl 
m ее 
P Esa (x-x)V-!(x—-X) 
(T. W. Anderson, 1958) 
444 In a set of п ranks the ranks рр, fiis, . - . рен are tied and allotted the rank 
pe+4(t+1). Show that their sum of squares is reduced by j(t*—7). Deduce the formula 
for the variance of a ranking with £j, t, . . . , tm ties: 


vr (e-»- Р (1-2). 


44.5 The data for versicolor and virginica may be classified by petal length and petal width 
in the following ordered contingency table. (For petal width “ small" means «1:5; for petal 
length “small” means «40, “ medium” means >4 and «5, “large” means 75. Figures 
to the left of the colon refer to versicolor, those to the right to virginica.) 


Petal width 


Show that if a new member is assigned on the basis of a majority in the cell frequency in 
which it falls, the probability of misclassification can be estimated at 8 per cent. Consider the 
meaning and reliability of this figure. 


44.6 A vector x is p-variate normal and samples are drawn from each of two populations 
with identical dispersion matrices. Each variable in each population is scaled to have unit 
variance, and those in the first population have zero mean. ‘The other means are given by 
б,] = 1, 2, .. . , p, and are taken as positive (by a change of sign of x; if necessary). 

If x, alone is used as discriminator show that the probability of misclassification, errors of 
either kind being equally important, is 


vis) oe (49) de 


DISCRIMINATION AND CLASSIFICATION 341 


(Cochran (1964). The suggestion has been made that if this probability is large, say 6;<4, 
the variable should be discarded as a poor discriminator.) 


44.7 In the previous exercise, if all p variables are independent, show that the best com- 
bined discriminator, scaled to unit variance, is 


р, 
Ў уху / X 3. 
ј=1 
For two independent variates ху, ху with values бу, д, (бу > 5) show that an observation on 
the first variate is equivalent to m observations on the second variate, where m = 63/03. 


44.8 In the previous exercise, let the variables ху, x, have correlation р. By considering 
the independent variables x, — p, x, and x, show that if б, = fó,, 0<f<1, the correlation improves 
the discrimination over what it would be if the variables were independent, provided that 

g-p* 

E] >f. 
Hence show that a negative correlation always helps the discrimination but that a positive correla- 
tion is harmful unless p> 2f/(1+f?). 


(Cochran, 1964) 


44.9 Continuing the previous exercise, suppose that the correlations between any pair 
ху, xy are the same and equal to p. Show that if р is negative, a discriminator based on all р 
variables is better than it would be if they were independent; but that if p is positive this is not 
so unless 


(Cochran, 1964) 


4440 Show that discrimination with two populations may be formally represented as а 
Least Squares regression analysis in which the dependent variable y can assume only two values, 
namely m = nj/(n; +n) for the n, members of the first population and (m —1) for the п, members 
of the second. This yields the boundary in Exercise 44.2 without the assumption of multi- 
normality. 


44.11 Discuss the approach of Exercise 44.10 for more than two populations, and show 
why it breaks down. 


СНАРТЕК 45 
TIME-SERIES: GENERAL 


45.1 Observations on a phenomenon which is moving through time generate an 
ordered set known as a time-series. The values assumed by a variable at time £ may 


Table 45.1—Annual yields per acre of barley in England and Wales from 1884 to 1939 
(Data from the Agricultural Statistics) 


Yield per 


Yield рег 


| Yield Yield n 
Year ке Year тес) Үеаг acre (cwt) Үеаг acre (суи) 
ма pees al E 

1884 152 1898 16:9 1912 | 142 1926 | 160 
85 16-9 99 16-4 13 | 158 27 164 
86 15:3 1900 149 14 157 28 172 
87 149 01 145 15 | 141 29 178 
88 15-7 02 16:6 16 14:8 30 14-4 
89 15-1 03 15-1 17 144 31 15:0 
90 16:7 04 | 146 18 156 32 16:0 
91 16:3 05 16:0 19 13:9 33 16:8 
92 16:5 06 16:8 20 147 34 16-9 
93 13-3 07 | 168 21 143 35 16:6 
94 16:5 08 155 22 | 140 36 16:2 
95 | 150 09 173 23 14-5 37 140 
96 15-9 10 15-5 24 15-4 38 18-1 
97 15-5 11 15-5 25 153 39. | 175 
18 T 
17 и кы. 

= 

$ 

816 

E 

$ 

515 

„Ф 

БЫ 
14 
13, 

1880 1890 1900 1910 1920 1930 1940 


Years 
Fig. 45.1—Graph of the data of Table 45.1 (barley yields per acre) 
342 


TIME-SERIES: GENERAL 343 


Table 45.2—Total annual rainfall at London in inches, for each year from 1813 to 1912 
(Data from D. Brunt, Phil. Trans., A, 225, 247, 1925) 


Rainfall | Rainfall 


| Rainfall 5 | Rainfall 
Year (inches) Year | (inches) Year | (inches) Year (inches) 
1813 23-56 1838 21-63 1863 | 2159 1888 27-74 
14 26-07 39 27-49 64 16-93 89 23-85 
15 21-86 40 19-43 65 29-48 90 21-23 
16 31-24 41 31-13 66 | 31:60 91. |. 2815 
i4 | 2355 42 23-09 67 26-25 92 | 2261 
18 | 2388 43 25-85 68 23-40 93 | 19:80 
19 | 2641 44 | 2265 69 25-42 94 | 2794 
20 | 2267 45 22-75 70 21-32 95 21-47 
21 | 31-69 46 | 2636 п | 2502 96 | 23-52 
22 23-86 47 17-70 72 33-86 97 22-86 
23 24-11 48 | 29-81 73 22-67 98 17-69 
24 32-43 49 | 2293 74 18-82 99 22:54 
25 23-26 50 | 1922 75 2844 1900 23:28 
26 22:57 51 20-63 76 26:16 01 22-17 
27 23-00 582 | 35:34 77 28-17 02 20:84 
28 27-88 53 25-89 78 34-08 03 38-10 
29 25-32 54 18-65 79 33-82 04 | 20-65 
30 25:08 55 23-06 80 | 3028 05 22:97 
31 27-76 56 22-21 81 | 2792 06 24:26 
32 19-82 57 22-18 82 27-14 07 23-01 
33 24-78 58 | 18-77 83 24-40 08 23-67 
34 20-12 59 28-21 84 20-35 09 26-75 
35 24-34 60 | 3224 85 26-64 10 25-36 
36 27-42 61 | 2227 86 27-01 11 24-79 
37 19-44 62 | 2757 87 | 1921 12 27:88 
35 
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Fig. 45.2—Graph of the last 50 terms of the data of Table 45.2 (rainfall) 
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or may not embody an element of random variation, but in the majority of cases with 
which we shall be concerned some such element is present, if only as an error of observa- 
tion. We may regard a set of values u(t,), u(t), . . . , u(t,) as the observed values of a 
multivariate complex. Their characteristic feature, however, is that the order of the 
set 1, tg... , t, is material and not, for example, accidental as it would be for a random 
sample ху, xs, . . . , Ху, in which the suffixes are adjoined for convenience of identification. 


Table 45.3—Average number of eggs per laying hen in the U.S.A. for each 
month of the years 1938-1940 


(Data from Report of the Bureau of Agricultural Economics, U.S. Dept. 
of Agriculture, on the Poultry and Egg Situation, March 1941) 


uir |. Jan: | Feb. | Mar. | Apr. | May | June | July | Aug. | Sept. | Oct. | Nov. 
| 


| 154 | 175 | 173 | 149 13: 8| 94 | 75 | 


1938 | 79 | 99 13-6 | 11:8 | 94 | 7-5 59 | 64 

1939 | 80 | 97 | 149 | 170 | 170 | 146 | 132 | 117 | 93 | 74 | 60 | 68 

1940 | 72 | 90 | 144 | 165 | 170 148 13-4 | 118 97 | 79 | 62 | 68 
20 


a 


~ 
2 


a 


— 


Average number of eggs per hen 


Mar. June Sept. Dec. Mar June Sept. Dec. Mar June Sept. Dec. 
1938 5 EE 1940 
late 


Fig. 45.3—Graph of the data of Table 45.3 (egg production) 


45.2 Although the variable ż will always be spoken of and thought of as a time- 
parameter, the theory we are about to develop has obvious applications to variation in 
space. For example, if we consider the variation in thickness of a cotton thread along 
its length J, or the variation in intensity of wire-worm infestation along a traverse d 
in a field, the variables Гапа d may be interpreted in a manner analogous to t. Indeed, 
it is possible to generalize to space the notion of a random variable dependent on more 
than one such parameter. This is not a generalization we shall attempt. 
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Table 45.4—Sheep population of England and Wales for each year from 1867 to 1939 
(Data from the Agricultural Statistics) 


Yee (PU | Үш Ру Yer | | хеш | "doy 
1867 | 2203 1886 | 1892 1905 | 1823 1924 1484 
68 2360 87 | 1919 06 1843 25 1597 
69 2254 88 1853 07 | 1880 26 1686 
70 2165 89 | 1868 08 1968 27 1707 
71 2024 90 1991 09 | 2029 28 1640 
72 2078 91 | 2111 10 1996 29 1611 
73 2214 92 | 2119 11 1933 30 1632 
7% 2292 93 1991 12 | 1805 31 1775 
75 2207 94 | 1859 13 | 1713 32 1850 
76 | 2119 95 | 1856 14 1726 33 | 1809 
7 2119 96 | 1924 1s 1752 3 1653 
78 | 2137 97 1892 16 1795 35 1648 
» | 2132 98 1916 17 1717 36 1665 
80 1955 99 1968 18 1648 37 1627 
81 | 1785 1900 1928 19 1512 38 1791 
82 1747 01 1898 20 1338 39 | 1797 
83 | 1818 02 1850 21 1383 

84 1909 03 1841 22 1344 

85 | 1958 04 1824 23 1384 | 
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Fig. 45.4—Graph of the data of Table 45.4 (sheep population) 
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45.3 A further feature which distinguishes the set u(t) from the values of a multi- 
variate complex is that 7 is continuous, and we may therefore have to consider an in- 
finity of values of u(t). It is customary and convenient (though not, perhaps, very 
exact) to speak of a continuous time-series when we mean that £ is continuous, not 


Table 45.5—Population of England and Wales at ten-yearly intervals from 1811 to 1961 
(Data from the Registrar-General's Statistical Review) 


Population Population 
Year (millions) Year (шоа) 
1811 10-16 1891 29-00 
21 12-00 1901 32:53 
31 13-90 11 36-07 
и 15:91 21 37-89 
51 17-93 31 39:95 
61 20-07 и ЕЕ 
71 22-71 51 43-76 
81 25-97 61 46-07 
50 
40 
Y 
€ 
5 
БЕ, 
Е 
Ek 
1; 
® 
320 
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10 


1811 1837 1851 1871 1891 1911 1931 1951 (961 
Years 


Fig. 45.5—Graph of the data of Table 45.5 (population of England and Wales) 


necessarily implying that u(t) is continuous for any given £ in the variables under dis- 
cussion. Likewise, by a discontinuous series we mean one given at a (discontinuous) 
set of points fj, t» . . . , fw although и itself may be a continuous variable such as a 
length or a weight. 

For the greater part of our treatment we shall be concerned with discontinuous 
series, but shall indicate applications to the continuous case where necessary. In fact, 
we shall mostly deal with series which are defined at equidistant points of time; and, 
taking the time-interval as unit, we may denote the values by to, wu; и, etc. If we 
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require to discuss values for time-points prior to the starting point uy, we may similarly 
denote them by u.,, u-n and so on. 


Some examples of time-series 

45.4 The reader is doubtless familiar with many examples of time-series such as 
occur in ordinary life: the sales curve of a commodity over a period of years, the records 
of temperature or barometric pressure at a locality, drawn by a stylus on a rotating 
drum, the population of a country at a series of census dates, and so forth. We proceed 
to give a few specific examples which will indicate the kind of domain to be covered 
and serve for numerical exemplification of the theory to be developed later. 

Table 45.1 (illustrated in Fig. 45.1) gives the annual yields per acre of barley in 
England and Wales from 1884 to 1939. Table 45.2 (Fig. 45.2) gives the annual rainfall 
in London for each year from 1813 to 1912. Table 45.3 (Fig. 45.3) gives the average 
egg-production per laying hen in the U.S.A. for each month of the years 1938 to 1940. 
Table 45.4 (Fig. 45.4) gives the sheep population of England and Wales as at June 4th 
of each year from 1867 to 1939. ‘Table 45.5 (Fig. 45.5) shows the human population 
of England and Wales at 10-yearly intervals from 1811 to 1961. 


45.5 These series are fairly typical of the kind of material with which our theory 
has to deal. The data of Table 45.1 (barley) present a very irregular fluctuation but, 
so far as the eye can see, there is no systematic element and no tendency towards increase 
or decrease over the period given. ‘Table 45.2 has some indications of oscillatory 
movements of а more regular kind. Table 45.3 provides an oscillatory effect which is 
definitely seasonal. Table 45.4 (sheep population) combines a general decline in 
numbers with marked oscillatory effects. Table 45.5 (human population) shows a 
regular growth without apparent fluctuation. 


Types of discontinuity 
45.6 The tables also illustrate various types of discontinuity to which observed 
series are subject: 


(a) In the barley series we have a case of essential discontinuity. There is one and 
only one yield per acre for each year. The actual time of harvest may vary from 
year to year but, roughly speaking, the intervals between successive observations 
are equal. 

(b) In the population series we have a discontinuity of observation, due to the fact 
that a census is taken only every ten years. The variable, however, exists all 
through the period covered and could be observed (theoretically) at any point of 
time. The same is true of the sheep series. 

(c) In the rainfall data we have a discontinuity due to aggregation. The “ rainfall А 
does not exist at a single point of time; it is the summation over a finite time-interval 
which is of interest. That interval, of course, is at choice. We may observe by 
year, by month, by day or even by hour. Intervals may overlap, as when we 
compile in successive weeks the rainfall for the previous month. Such data are 
nevertheless discontinuous time-series in our sense. 
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(d) When we have a continuous record, e.g. on a barograph, we cannot tabulate it 
after the manner of Tables 45.1-45.5. We can take readings where we like, but 
not everywhere. In consequence, we cannot analyse such series by digital com- 
putation except as an approximation; we can, however, analyse them by methods 
involving graphical integration, e.g. by a planimeter or some more elaborate device. 


45.7 In practice the time-points at which we observe the series are often determined 
for us, especially in economics. In experimental situations we may be able to decide 
them ourselves before the data are collected, or afterwards if a full record has been kept. 
The question what is the best interval of observation is to be decided in the light of 
the circumstances of the individual case, and is not one on which we can enter at this 
point. (Very little theoretical work has, in fact, been done on it.) We may note here, 
however, that observation at fixed equal intervals, convenient as it may be, can suppress 
evidence of oscillatory movements which have a period equal to those intervals or some 
sub-multiple of them. The annual observation of the sheep population, for example, 
will take no account of seasonal variation within the year due to slaughtering or breeding; 
the annual rainfall figures conceal the fact that rainfall is seasonal to some extent, even 
in London. 


Calendar trouble 

45.8 Whether time-intervals are equal or not, it is obviously desirable that observa- 
tions should be comparable inter se. For series which are based on days or months 
there are certain nuisance-effects, due to the nature of the calendar, which have to be 
removed to ensure comparability. Some of these difficulties we can lay at the door of 
Nature, for not arranging that the year shall contain an integral number of days; but 
most of them are attributable to the man-made calendar. Months, for example, are 
not the same length; public holidays affect the comparability of economic and social 
data; exchanges and markets close over the week-end; and so on. Experimentally 
generated series are usually free from such difficulties if due care is taken, but they 
can arise in industrial series both in the large (e.g. stoppages due to strikes) or in the 
small (e.g. meal-breaks). We shall suppose that our data have been corrected for 
such effects so as to bring them on to a comparable basis. 


The problems of time-series analysis 

45.9 The ultimate object of analysis of a time-series—as of statistical analysis as a 
whole—is to arrive at a deeper understanding of the causal mechanisms which generated 
it, either out of sheer curiosity or because we wish to extrapolate into the future. It 
does not follow, however, that such understanding can be achieved by considering one 
series alone; for the series may be only a single facet of a complex phenomenon gener- 
ating a substantial number of different series. We shall revert to this question in 
Chapter 50 when we discuss multivariate systems. For the present we curb our 
ambition to some extent by confining ourselves to the study of the type of behaviour 
of a single series and the setting up of models which can generate it; recognizing that 
such models themselves may be only portions of a more basic structural system. We 
shall see later that no logical inconsistency need be produced by this approach. 
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45.10 A survey of the practical examples we have given and of others known to 
the reader suggests that the typical time-series may be composed of four parts: 


а) a trend, or long-term movement; 
H H d * 
b) oscillations about the trend, of greater or less regularity; 
gr egu! 
(c) a seasonal effect; 
(d) a “ random,” “ unsystematic ” or “ irregular ” component. 


As a matter of mathematical description, we can always represent a series as one 
of these constituents or the sum of several of them. A large part of the traditional 
theory of time-series, in fact, is devoted to an analysis of the data into such components, 
so as to isolate them for separate study. We must, however, attempt to avoid a trap 
here. It does not follow that if we can represent a series as a sum of such components, 
they correspond to independently operating causal systems. The decomposition of a 
series is very often useful, but it may be misleading and in any case is not the ultimate 
object of statistical analysis. 


45.11 Perhaps the easiest component to understand and to remove from the series 
is the seasonal effect. This is a fluctuation imposed on the series by a cyclic phenomenon 
external to the main body of causal influences at work upon it. The oscillation in 
egg-production in Table 45.3, for instance, reflects the rhythm in the reproductive 
process which is found among birds in virtue, ultimately, of the fact that the earth goes 
round the sun once a year. We shall confine the word “ seasonal" to those effects 
which are annual in period; but the same ideas can be applied to any phenomenon 
generated by strictly periodic natural processes, such as “ spring” and “ пеар ” varia- 
tion in tides or daily variation in temperature. We must, however, be careful about 
extending the notion of seasonality to phenomena which are not demonstrated beyond 
reasonable doubt to depend on strictly periodic stimuli. For instance, it would be 
going too far, in the present state of our knowledge, to speak of sunspot variation as 
seasonal in this sense, and much too far to speak of seasonality in crop-yields as deter- 
mined by sunspots, even if the relation between the two were established. We shall 
return to this point below when defining what we mean by a “ cycle ” as distinct from 
an “oscillation.” 


45.12 The concept of trend is more difficult to define. Generally, one thinks of 
it as a smooth broad motion of the system over a long term of years, but “ long ” in 
this connexion is a relative term, and what is long for one purpose may be short for 
another. For example, if we were examining rainfall records over a hundred years, a 
slow rise from the beginning of the period to the end would be regarded as a trend; 
but if we possessed records for two thousand years (and the rings in some of the giant 
redwood trees give an index of climatic conditions for periods of this order) the rise 
over a particular century might appear as part of a slow oscillatory movement, so that 
any inference from the “ trend ” in a particular century to the effect that the weather 
was likely to continue becoming wetter and wetter might be quite false. What inference 
we should make in practice would depend on what we were trying to do. If we were 
engineers designing a water-supply system and wished to provide against droughts of 
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reasonable extent, we might perhaps assume that the trend would last as long as our 
works and proceed accordingly; but if we were attempting to study climatic changes 
over the face of the earth for geological periods of time we should accept the continuance 
of the trend with the greatest reserve or, more probably, should reject it on collateral 
grounds. 


45.13 However long a series may be, we can never be certain, and often not even 
reasonably sure, that a trend in it is not part of a slow oscillation, except of course 
when the series has terminated (as might, for instance, be the case if we were con- 
sidering the lengths of reigns of the Roman Emperors). In speaking of a trend, there- 
fore, we must bear in mind the length of the series to which our statement refers. 
Perhaps it would be more accurate to speak of slow or quick movements rather than of 
trend and oscillation, but even so the distinction between the two would remain a matter 
of subjective judgement to some extent. 


45.14 When seasonal variation and trend have been removed from the data we are 
left with a series which will present, in general, fluctuations of a more or less regular 
kind. Fig. 45.1 represents the kind of series we obtain, since it has no components 
of trend or seasonality. The question then arises, is this residual series systematic in 
the sense that its values can be represented as a function of the time? Or, on the 
other hand, are the values random in the sense that they could occur, in the observed 
order, by random sampling from a homogeneous population? Or again, is there some 
possibility intermediate between complete functional variation and complete random- 
ness? The search for systematic effects in residual fluctuation gives rise to several 
techniques of analysis, the object of which is to detect whether any part of the series is 
subject to law, and therefore predictable, and whether any part is purely haphazard. 
The former part we shall call systematic, and it will be referred to as an “ oscillation ” 
(not a “ cycle,” which is a very special case of an oscillation, as we shall see later). 
The remainder of the series we shall call the unsystematic component, and refer to its 
movements as “ random " or “ stochastic.” When a series is a mixture of oscillation 
and random movement it will not cause any inconvenience to refer to the up-and-down 
movement generally as fluctuation before we have analysed it into its constituents; 
that is to say, we may speak of fluctuation without prejudice to the possibility of detect- 
ing oscillatory movements in it. 


Tests of randomness 

45.15 Some of the series with which we are concerned are clearly not random. It 
would be a waste of time to test the data of Tables 45.3 and 45.5 for the presence of 
some systematic effects. In some cases, however, it is not obvious whether systematiza- 
tion is present, as for example in Table 45.1 (barley yields) and Table 45.2 (rainfall). 
We shall spend most of the rest of this chapter discussing tests of randomness in series. 
Specifically, given an ordered series of observations 1, ta, . . . , иһ, can they have arisen 
by chance in that order by sampling independently on л occasions from a population 
of unknown characteristics? 
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45.16 There is no limit to the number of tests which can be set up for this purpose. 
In choosing the most suitable we must have regard to a number of criteria: 


(a) If possible, the test should be distribution-free. 

(b) Since we may wish to test fairly long series, the calculations should be kept to a 
minimum. 

(c) Although we may not be able to specify an alternative hypothesis with precision, 
we may have some idea of its nature and can select a test which is likely to have high 
power against the alternative. For example, if we suspect trend we may find it 
useful to employ a different test from one used to test against periodicity. 


We proceed to consider some tests satisfying these criteria. 


Turning points 

45.17 One of the easiest tests to apply is to count the number of peaks or troughs 
in the series. А “ реак” is a value which is greater than the two neighbouring values. 
If there are two or more equal values which are greater than their predecessor and 
successor (a rare event in general) we shall regard them as defining one peak. Likewise 
a “trough " is a value which is lower than its two neighbours. Our first question is: 
What is the distribution of peaks in a random series? (The distribution of troughs is 
evidently the same with a change of sign of the variate.) 

In point of fact, we shall find it more convenient to treat both peaks and troughs 
as cases of “turning points ” of the series. The number of turning points is clearly 
one less than the number of runs up and down in the series. The interval between 
two turning points is called a “ phase.” 


45.18 Three consecutive observations are required to define a turning point, say 
Uy) иь из. Tf the series is random these three values could have occurred in any order, 
namely in six ways. In only four of these ways would there be a turning point (when 
the greatest or least value is in the middle). Hence the probability of a turning point 
in a set of three values is $. 

Consider now a set of values иу, ts, . . . , Un, and let us define a “ marker ” variable 
X, by 


Х; = 1, Uz < Uz yy > щаз 
Or Up> Uzi Wis 
=0 otherwise; i= 1, 2,..., m—2. (45.1) 
The number of turning points p is then simply 
фы eee (45.2) 
ici 


We have at once 
E(p) = E E(X)) = (1—2). (45.3) 


Alo  E(p)- z( = х) 
1 
хата ххх Б их 
n-2 n-3 n—4 


(n—4)n—5) 


ЕЁ 50, 1, 2; (45.4) 
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where the suffixes to the X signs indicate the number of terms over which summation 
takes place. As a check note that 
(n—2)* = (n—2)+2(n—3)+2(n—4)+(n—4)(n—5). 
We then have 
E(p?) = (n—2)EX?+2(n—3)E(X, Х,а) 


Xn-4)EX Ху) + (1-4) (1-5)E(X; Х,а). (455) 

Since ХЗ = X, we have 
EX) = 4. (45.6) 
For k>2, X; and X, are independent, for they have no value of и in common. Thus 
E(X: Хд) = E(X)EQGA) = $- (45.7) 


It remains to evaluate E(X;X;,;) and E(X;X;,.). For the first, consider four con- 

secutive terms which, in ascending order of magnitude, may be denoted by the 

numbers 1, 2, 3, 4. The only non-vanishing contribution to X; X;,, which can arise 

from a permutation of these numbers arises when there is a turning point in the second 

and third places. If the reader will write down the 24 possible permutations he will 

find that only ten make a non-vanishing contribution, namely 
1324 2143 3142 4132 
1423 2314 3241 4231 

2413 3412 

Thus 

ahd 

=p 

For X; X;,, we have to write down the 120 permutations of the integers 1 to 5 and 

count up those with turning points at both the second and fourth three places. There 

are, in fact, 54 and thus 


E 1 
E(X: Xin) = 9 (45.8) 


срт etj 


EQ X4) 7 15 7 2 (45.9) 
Substituting in (45.5) we find, on reduction, 
40n? — 144n + 131 
уы : 
цр) = Sea ets 
Hence, using (45.3), we have 
var pis Жы, (45.10) 
Higher moments can be obtained in a similar manner. We find 
—16(n+1 
кз(ф) = - а ) (45.11) 
_ —1408n--3317 
OQ) = 18900 (45.12) 


Thus, in standard measure «3() is approximately —0-2n-! and к,(р) is approximately 
—2-Ф 1, indicating a fairly rapid tendency to normality as л increases. 
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45.19 Now consider the distribution of phase lengths. To define a phase of 
length d (say, a run up) we require d+3 terms, involving a fall from first to second, a 
rise from second to third, third to fourth, (d-- 1)th to (d+2)th, and a fall from (d+2)th 
to (d+3)th. Consider the d+3 values arranged in increasing order of magnitude. If 
we pick out two other than the first and the last and transfer one to the beginning and 
one to the end, we obtain a rising phase of length d. There are 4(d+1)d ways of 
picking out the pair, and each may go to either end, so there are (d+ 1)d rising phases. 
But in addition we may put the first member at the end and any of the others except 
the second at the beginning, giving us d+1 further cases; or the last member at the 
beginning and any except the penultimate at the end, giving (4+ 1) further cases; and 
from this total we must subtract the case where the first is last and the last first, which 
has been counted twice. Thus there are 


(4+1)4+(4+1)+(4+1)—1 = d?+3d+1 
rising phases. The probability of a phase, either rising ог falling, is then 
2(d?+3d+1) 
(d+3)! 
Now in a series of length л there are n—d—2 possible phases of length d. The 
expected number of phases of length d in the set of » values is then 
2(n—d—2)(d?+3d+1) , 


(45.13) 


М, = - 3 = А 
а (9+3)! (45.14) 
The expected total number of phases N, from (45.14), is given by 
_ 5 "53 (n—d—2)(d2+3d+1) 
шы (4+3)! 
Now (n—d—2)(d2+3d+1) = — (d--3) (d-- 2) (d-- 1) - (n 1) (d-- 3) (4-- 2) 
— (2n - 1) (d- 3) (n 1) 
and hence 
ntl 2041, n4l 
л? zs di ain antes 
=2 (E = aa) (45.15) 
For all practical purposes we may neglect the second factor in (45.15) and hence 
N = }(2n-7). (45.16) 


Since the number of phases is one less than the number of turning points except in 
the 2 cases out of n! where both are zero, (45.15) agrees with (45.3). Now 


6(n—d—2)(d?+3d+1) 
ММ 7 сздн 27у р 


We may derive the moments of this ratio fairly easily. For example, 


‚_ _6 "58 d(n—d—2)(d°+3d+1) 
porcus (43) 
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5606 Ss rm QUÀ 3nt2 | 5n33 3(п+1) 
2n—-7 ı | (d-1)! d! (4+1)! (d-2) (4+3)1Ј` 


n 
Remembering the rapid convergence of X 1/x! to e, we may to a very close approxima- 
0 


tion write this as 


w= mare 1)(e— 1) (3-2) (72) «6n 3675) - 3 1(:-3)) 
_ 3(n+7—4e) 3 


ord - (45.18) 
Likewise we find that 
iie Gm {(8е—21)п*+ (4е—17)л— (48е®—140е-++14)} = 0:560. (45.19) 


45.20 The distribution of which these are the moments does not tend to normality 
as n increases (cf. Exercise 45.1). A natural procedure in testing for randomness is 
to compare the observed distribution with the expected distribution given by (45.14). 
For shorter series, however, there is a theoretical difficulty in that the lengths of phase 
are not independent, so that a straightforward у? goodness-of-fit test is not valid. The 
question was examined by Wallis and Moore (1941) who came to the conclusion that 
for a three-fold classification d = 1, 2, >3 (two degrees of freedom) the X? statistic) 
can be tested іп the ordinary form with » = 2} for X?>6-3. For lower values $X? 
can be tested in that form with > = 2. 

Wolfowitz (1944) and Levene (1952) showed that the number of phases tends to 
normality and Gleissberg (1945) tabulated the distribution of this number for n «25. 


Example 45.1 

Consider the barley yields of Table 45.1. "There are 56 values in this series, but 
at two points (1906, 1907 and 1910, 1911) the values in successive years are equal. 
So far as concerns turning points and phases we shall count each of these as one point 
and reduce the number of terms to 54. 

If the reader will mark the peaks and troughs on the table, or count them on Fig. 45.1, 
he will find that there are 35 turning points. The expected number, from (45.3), is 
$(52) = 343. This is so close to observation that no further test is necessary. 

'The distribution of phases will be found to be 


Phase Мо. of phases оо Biases, 
length observed (45.14), (45.16) 

1 23 21-25 

2 7 947 

3 4 2:59 
Тота. 34 33-67 


Again a test is hardly necessary. 


(е See footnote to page 421, Vol. 2. 
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The conclusion would be, on these tests, that the variation in yield from year to 
year was random. 


45.21 Considered as a test against trend the turning-points test has a poor per- 
formance, and we shall see later (Exercise 45.4) that it has zero efficiency compared. 
with other tests in certain cases. "This is intuitively reasonable, for “turning” is a local 
property and would not be much affected by whereabouts along a line of gentle trend 
development the series had arrived. Considered as a test against cyclicality the test is 
obviously better. In a random series the mean interval between turning points is 
about 1-5 with a variance (from (45.10) and (10.14)) of about 9/(10п). ‘The test itself 
is enough to enjoin further investigation in series of more than 10 terms whenever the 
mean interval between turning points is 2 or more. 

The power of tests against specific alternatives for runs up and down has been 
investigated by Levene (1952). 


The difference-sign test 
45.22 A somewhat more laborious test consists of counting the number of positive 
first differences of the series, that is to say, the number of points where the series 
increases. (As before, we shall ignore points where there is neither increase nor de- 
crease.) With a series of л terms we have n—1 differences. Let us, as before, define 
a variable 
X,=1, щы>щ 


=0 unie; $e1,2,..,(n-1) (45.20) 
Then the number of points of increase, say c, is given by 
c= TX, 
i=l 
For a random series we have immediately 
Е(с) = (n- 1)E(X)) = 4(n- 1). (45.21) 
Likewise 
Е(с) = E{ E X$+2 2 X; Xint ® XX}, ј + i,i+1, 
n—1 n-2 (n—2)n—3) 
= (n—1)EX;2+2(n—2)E(X,X;41) + (n—2) (n - 3) EX, X). 
= 4(n—1)-2(n—2)E(X, X41) + 1(n—2) (n 3). (45.22) 


To evaluate E(X; X;,,) we consider permutations of three. Only in one case out of six 
does this give a non-vanishing contribution. Hence, from (45.22) and (45.21) we find 


varc = Мп—1)+(п—2)+4(п—2)(п—3)—М(п-1)* 
= ord. (45.23) 


The distribution tends fairly rapidly to normality (cf. Exercise 45.3). It has been 
tabulated by Moore and Wallis (1943). 


45.23 "This test is clearly useless against an alternative of symmetrical oscillation, 
where the number of movements up will approximate to the number of movements 
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down. It has been advocated mainly as a test against trend, and especially against 
linear trend. As such it is very superior to the turning-points test but very inferior to 
other tests based on rank order which we consider below. 
Consider, in fact, a series with a linear trend and a random residual 

ш = a+ft+e, (45.24) 
where e, is a normal variable with zero mean and unit variance. We can regard this 
as a regression of u, on t in the special case when ¢ takes the equidistant values 
1,2,...,m. In the regression situation we should estimate f by 


_ E(u- ü) (t-i) 
pigs m (45.25) 
which is unbiassed and has variance 
тасы л ирер IO (45.26) 


рухат артта аети 
We now use the asymptotic relative efficiency (cf. 25.5-6) to compare other con- 
sistent tests with that based on b. Since b is unbiassed, [9E(P)/0f],.. = 1, and (25.16) 
becomes, with m = 1, 


[Е'(&)]з—о/(уат b)! ~ (n3/12)!. (45.27) 
Thus, for the statistic b, ô defined by (25.16) takes the value 
dm E: (45.28) 


We now compute 6 for the difference-sign test statistic. 
Consider the “ marker” variable 


Ay=1, u;»u, 
=0 ш<ш (45.29) 
with Hg = 1. 
The expectation of H;, is the probability that Ну equals unity, and since u;—u; is 
a normal variable with mean f(i—j) and variance 2, this is equal to 


[дул re 6-067 )12s 


1 LJ 
- TEN dels exp (~4y")dy. 
У 


ә pere 
Hence [55500] "Aves nh (45.30) 


"This is all we require for the ARE of the present test, but for later purposes we proceed 
to calculate some other quantities of a similar kind. 
Since Н, H,, are independent, i, j, k, | unequal, we have 
д д д 
$ Ен,н,)) = [Et 2 Ен, | + [sas 2 Ен ] 
[в 5078], = [вно зунд) + [нә әннә), 
Аах. 
= ——-(i-j-k-1). 31 
Tat j+k—-D (45.31) 
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Consider now Hi; Hj, Since y;—y; and y;—y, are jointly normally distributed with 
correlation —1 we have 
E(H;; Hj) = Prob (Hy = 1, Hj, = 1) 
© © 1 
= 2——5; x] 
| —Bi-D/ V2 | D ж н 


[emnes 


and 
= КА А (45.32) 
Similarly 
[5 BH Hy) | _, > HE, (45.33) 


Finally we require the similar expression when у; ұз Уг and Y;+1— y; аге of opposite 
signs. This is the probability that (угз Уга) (Уг+1—2) is negative and is seen to be 


E E } { 2 (а—ху+ UL 
= — expy — = (x 

A ear ES pii i 
whence we find 


à 
[55 Жы: (45.34) 


Reverting to the difference-sign test, we have that the number of positive increments 
in the series—cf. (45.20)—is 


n—1 
pa LJ H; isv 
= 


Thus from (45.30), 


TO] ar п-1 
wa $i =. 45.35 
[ЫА an ive NE 
Remembering that the variance is, from (45.23), (n-- 1)/12, we have 
20 i. (ny 
( а as озго) =) (45.36) 
Thus, for the difference-sign test, à defined at (25.16), Vol. 2, takes the value 
ô =}, (45.37) 


and comparison of (45.37) and (45.28) shows that с has zero asymptotic relative 
efficiency, by (25.24). 

Mann (1945a) gave a lower bound to the power of the test. Stuart (1952) has 
tabulated the power of the test against the normal regression alternative at the 95 per 
cent level. 


Rank correlation tests 

45.24 There is a prior presumption that we shall improve our test still further 
if we compare, not merely neighbouring pairs as in the difference-sign test, but all 
pairs. Given a set of values и, us . . + 5 Um in that order, let us count the number 
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of pairs in which uj»u; j»i. If this is P, we note that there are }n(n—1) pairs and 
that the expected number in a random series is }n(n—1). The excess of P over this num- 
ber indicates a tendency to positive trend, a deficiency corresponding to a negative trend. 

In fact, this quantity is a simple linear function of the rank correlation coefficient) т, 
defined at (31.23), Vol. 2, between the order of the variables in time and their order in 
magnitude и. For a random series the variance of т is known. If О is the comple- 
mentary quantity to P, namely the number of values for which u; « u;, j>i, we have, 
by (31.23), 


= 40 
0E (45.38) 
and from (31.33-4), 
E(x) - 0 (45.39) 
vars = 9. (45.40) 


The distribution of т tends rapidly to normality—cf. 31.26. 


45.25 In the notation of 45.23 we may write 
Q= È Hy. 
i<j 
In the case of an alternative linear trend (45.24) with normal residuals, we then have, 
from (45.30), 


ә т=р=. 
[559], .- тут È 
= у 3 *0-) 
1 n(n*-1) 
"2UR EG ^ 
Also, from (45.40), 
var Q ~ se 
so that EZO / (var О)» ~ {n8/(4x)}}. (45.41) 


Substituting (45.41) and (45.27) in (25.27) we have for the ARE of т (or О) relative to 
the regression estimator, 
= (IE Olp-o/(var N a _ (3 a) = 0: 
Am 1 айг pe = (8/2) = 0:98, (45.42) 
a result previously given in 31.38. The statistic т is therefore very efficient in this case. 
Mann (1945b) considered the t-test for variables u; such that P(u;» uj) = 4+ &y for 
i<j, and obeying certain other conditions, and (cf. Exercise 31.8) gave conditions for 
the test to be unbiassed, and an example in which it is the most powerful test. 


(*) In other contexts we write t for the statistic т, but as this would cause confusion here with 
the time-variable, we depart temporarily from the usual convention not to employ Greek letters 
for sample values. 
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45.26 We can also calculate the Spearman rank correlation rg (31.22, Vol. 2) in 
this case. It may be written (cf. (31.40)) 
12V 


Tg = lice (45.43) 
where 
V= X (j-)Hs. (4544) 
i<j 
As at (31.22) we have, for a random series, 
E(rg) = 0 
1 
маг а и 
so that 
EV = GEDE) (45.45) 
mes 
var V = id (45.46) 
We then find 
2 1 me 
S E aL EE quei 
E б], 22 i<j 0-9 
amel) 
Mya C (45.47) 


and exactly as at (45.42) we find 
i 
"res В = 0-98. (45.48) 
The Spearman coefficient has, then, the same ARE as т. 


45.27 Both т and rg are more troublesome to calculate than the difference-sign 
or the turning-point statistics; and in practice rg is easier to calculate than т, using its 
form (31.21). 


Example 45.2 
Not to overburden ourselves with arithmetic, let us take the first twenty-five terms 
in Table 45.2 (years 1813-1837). In order of magnitude the values of и, are 


Rank Rank Difference? | Rank Rank Difference? Rank Rank Difference? 


1 9 64 | 10 11 179 49! 21 4 
2 18 256 11 13 4 e|] 320 2 324 
3 4 a, 12 25 1609 | 21 15 36 
4 23 361 13 8 15 1-2 3 361 
5 10 25 14 5 gne ees 14 81 
6 12 36 15 7 64 24 20 16 
7 19 144 | 16 22 36 25 1 576 
8 6 4 17 17 0 2898 
9 24 225 18 16 4 
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Thus V = 4(2898) = 1449 
1 СОХ) 
tg = 1-р = 014 


The correlation is small. The standard error of rg is (n—1)!, about 0-2, and the observed 
value is thus not significant of trend. 

There are 17 turning points, against an expectation of $(25) = 16-7, in almost 
perfect agreement. 


45.28 We may also mention, without entering into very great detail, two other 
tests which have some interest: 

(a) The records test. An observation u, is called an upper (lower) record if it exceeds 
(is smaller than) all previous observations in the series. The number of records 
appearing as we go along the series provides a test statistic which can be compared 
with the distribution from a random series. The subject has been explored by 
Foster and Stuart (1954), some of whose results are presented in Exercises 45.8-9. 
It appears that, as a test against trend, the records test is more powerful than the 
difference-sign test or the turning-points test, but is considerably less powerful 
than the т or rg tests (Stuart, 1956, 1957). 

(b) The rank serial correlation test. This is a special case of a type of statistic, 
the serial correlation coefficient, which we shall introduce in 45.32 below. If 
the ranks of a set of n quantities, measured from the mean rank }(n+1), are dj, 
i=1, 2,...,m, the coefficient of order k is defined by 

1 ^+ 


ry = Roki (45.49) 


So far as a test statistic is concerned we may use simply 
n-k 
Ж. = У didie (45.50) 
i=l 


The coefficient W, is the covariance (multiplied by n—k) of the terms in the rank- 
series distance k units apart. For a random series its expected value is zero. As 
a test of trend the coefficient has zero efficiency against the normal linear regression alter- 
native (cf. Noether (1950), Stuart (1954, 1956)), although it was suggested as a test 
against trend by Wald and Wolfowitz (1943). 


45.29 If a series is random we shall clearly be able to test it equally well by ignoring 
certain terms, e.g. by taking every other term or every twelfth term. To look at this 
from another viewpoint, if our series is only recorded at periodic intervals instead of 
in toto, our tests remain valid. We lose information, of course, but not validity in 
the tests. The same is true of aggregative series; for example, if we have the (annual) 
aggregate of twelve monthly records, each of which may be regarded as a member of 
a random series, the annual figures are also random. On the other hand, randomness 
in an annual series does not rule out the possibility of seasonal movements in the 
constituent series. 


TIME-SERIES: GENERAL 361 


45.30 Some series (for example, a razor blade under the microscope) present an 
irregular fluctuation which looks like a kind of randomness in the limit as the interval 
between successive observations becomes smaller. On the other hand, the series are 
continuous, and we arrive at the question whether it is possible to have a continuous 
random series. In our view, it is not. There is, to our mind, something essentially 
discontinuous in the idea of independence of successive observations; continuity would 
destroy independence. We can imagine a set of points, each determining the value 
of a random variable, becoming ever closer together, and the variance of the variable 
diminishing so that the total range of variation remains within finite bounds. But it 
does not appear possible to proceed to the limit in the way that the mathematician 
proceeds from an enumerable to a dense set of points on a line. 


45.31 For most, if not all, practical series we can imagine them as continuous to 
the eye but discontinuous under the microscope. Pressure may be thought of as a 
continuous variable but on a sufficiently small time-scale is discontinuous, being the 
result of impacts of individual molecules of gas. The profile of the cotton fibre may 
be similarly imagined as continuous, although ultimately composed of discontinuous 
particles. In this sense we may, perhaps, speak of a continuous random series, but 
it is a form of expression which we shall have to watch very carefully. ‘To test such 
a series for randomness we may take observations at any suitable interval and carry out 
on the resultant one of the tests we have discussed earlier in the chapter. A discussion 
of more refined methods of approach must await our account of correlogram and 
spectral analyses. 


Serial correlations 

45.32 For series which are not random there will be dependencies of one kind or 
another between successive terms. One very useful measure of this effect is the product- 
moment correlation between successive observations. Given n values t, tg, . . . , Up, 
the so-called serial correlation of lag 1 is defined by 


(45.51) 


Likewise, the serial correlation of lag k is the correlation between pairs of terms 
k units apart, viz. 


. (45.52) 


In practice (and also for theoretical convenience) it makes for simplicity to modify 
these definitions to some extent. Instead of measuring the first (n—k) v's about their 
mean, we may measure about the mean of the whole set of observations; and similarly 
for the values at the end. Similarly, instead of taking separate variances in the 
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denominator of (45.52) we may use the variance of the whole series. Thus, writing ñ 
a 

for X u;/n, we may put 
i=l 


nok 
зы 2, (щ—й)(щ+ь—й) 
fy = a Ser (45.53) 

LE wa) 
This is the form we shall mostly use. For series of moderate length the difference 
from (45.52) is negligible. We must be careful not to use (45.53) for short series 
where exactitude in estimation is necessary. In particular, values of 7, greater than 
unity may arise. 


45.33 The array of coefficients rẹ (—1), ту, fa . . . tells us a good deal about the 
nature of the internal dependence of the series. ‘Their totality is called the correlogram, 
а term which is also used to denote the graph of rẹ as ordinate against k as abscissa. 
In a random series they are, apart from ro, all equal to zero within sampling limits. 
We shall study their properties for other types of series at length in later chapters. 

It is to be noted that, by definition, r_, = 7. 


45.34 For certain theoretical enquiries, and for computational convenience in 
a minor way, the definition (45.53) may be modified still further. For a coefficient 
of order k there are only n—k terms in the numerator. Suppose we put 
Ung = Uy Шула = Ug 00s Unig = Uy (45.54) 
We may then sum the product-moment in the numerator over n terms to obtain 
У (щ—й)(щ+ь—й) 
n= = К (45.55) 


È (u-i) 
i=1 


We are obviously here distorting the data by assuming (45.54). But if k is small 
compared to n we are not distorting it very much. The coefficient rẹ of (45.55) is 
called a circular serial correlation. The point of introducing it will become evident 
in 47.30 and Chapter 48. 


45.35 In concluding this chapter we may, for the avoidance of confusion, refer 
to a type of occurrence which is, in a sense, a time-series, though not of the kind we 
are considering here. Events such as accidents, arrivals of cars at traffic lights, out- 
breaks of epidemics, etc., may happen from time to time in a certain area and thus 
over a period constitute a series of events. The intervals between them are, usually, 
irregular but may nevertheless have a distribution function. Such patterns of behaviour 
are studied in the theory of stochastic processes. It is the intervals between happenings, 
rather than the happenings themselves, which are of interest, and this topic is really 
quite distinct from our time-series, which concerns a complex moving through time 
and observed at specified intervals. 
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45.1 In the distribution of phases of 45.19 show that the moment-generating function of 
Na/N for large n is given by 
З(е—#— 2е—%8-+ e-9)(g-1— 69) + § fe, 
where g = exp (е). 
Hence verify that wi = 1:5, к, = 0:560, and show that кз = 0-677, к; = 0:904. The dis- 
tribution is thus not normal. 


45.2 For the distribution of positive first differences (from a random series) of 45.22 show 
that odd order cumulants vanish, and that 


к, = —(n+1)/120. 


45.3 In the n! permutations of n numbers show that if Р, is the number with S positive 
differences, 


Pn(S) = (S+1)Pn-1(S)+("—S)Pn-1(S—1). 
Hence obtain the recurrence expression for the moments 
En(x*+2) = 0 


Ente) = 131g, ee p Beale D4 
where x = S—E(S). 
Hence by induction show that 
tim O „(%—1у(24—3у...3.1 


„>= Urals} 
and thus that the distribution tends to normality. 
(Mann, 1945a) 


45.4 Show that, against the normal regression alternative, the turning-point test based 
on p defined at (45.2) has 
{E'(p)p-0 = 0, Е” (Р))в-о # 0, 
and 6 defined at (25.16) = }, so that its ARE, compared to the regression coefficient test is 
zero; and that it is also zero compared to the difference-sign test. 
(Stuart, 1954, 1956) 


45.5 Observing that, in the notation of 45.23, a rank r; may be expressed as 
n 
n= У Hy, 
j=1 
show that, for the normal regression alternative, 


a _ nln +1) ,. 
[Zan], Soave {i+k-(n+1)}. 


Hence, for the statistic W of (45.50), 


2 д nd 
[zm], = [eon], i 24a! 
var W ~ n5/144, 


and thus that W has dw = 5 and zero ARE compared to the regression estimator. 
(Stuart, 1954, 1956) 


and 
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45.6 If values wi, = 1, 2,. . . , n are chosen at random from a set of continuous distribu- 
tions with frequency functions f;(u), show that the expected number c of points of increase 
in the series иі, tia, ... , Un is given by 


п-1 г 0 Men 
=| Жаб] Salus) dui duty. 


For the rectangular distribution with linear trend 
fiu) = 1, #<щ<#+1, 
= 0 elsewhere, 


E© 


show that 
lim 1 go = #{1+6(2—|6))}, —1«0«1, 
non 


= 0, 6<-1, 
= 1, 0>1. (Levene, 1952) 


45.7 Аз in Exercise 45.6, show that for the normal distribution 


1 
fiu) = Vom 9? {= ш-н)? 


lim 1 E() = Ф(0/ /2), 
non 
where 


se ss 
жо = us] rr 


and the trend is given by щ = (i—1)0. 
(Levene, 1952, who tabulates the values) 


45.8 For a random series xj, Xa ..., Xn define 
ur = 1 if the rth observation is an upper record, 
= 0 otherwise; 


l, = 1 if the rth observation is a lower record, 
= 0 otherwise; 


апа 
Sr = иш +, dr = ur—lr. 
Define also 
n 
з= le 
r=2 
n 
а= Xd. 
=? 


The scoring commences at the second observation. 
Then if p” (s, d) is the joint frequency function of s and d in a series of r observations, 
show that 


Ps, a) = (1-2) pf», d) pe) (6-1, d-1)+1 p= (71,441) 


20,0) = 1 
and hence derive the probability-generating function 2(0;, 0) as 


1 "2 
800,0) = = ir (7+0; 0,—0,/0;). 
r= 
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Hence derive, for the characteristic functions of s and d, 


1 ^2 
4, (n) = si п (+22) 


1 m2 
ga™ (t) = — П (+2 cos t). 
п! 
Derive also the joint c.f. and by inversion show that 
з 
p(s, d) = p(s) 27* EN 
where р, is the frequency function of s given by 
POO = E un- (n—s—1) 
n! 


and u™ (r) denotes the sum of products of all selections of r integers out abl A но. 
(Foster and Stuart, 1954) 


45.9 In the foregoing exercise show that 
n 
Е(а) = 0, vard=2% E 


r=2 7 
n n 
EQ =25 }, chee wns 
real r-27  r-2T 


(Foster and Stuart, 1954) 
45.10 Show that rke of (45.55) cannot exceed unity, but that rz of (45.53) may do so. 


45.11 For the coefficient of (45.53) calculated from a random series show that 


and hence that ry is biassed as an estimator of serial correlation in the parent series, 


СНАРТЕК 46 
TIME-SERIES: TREND AND SEASONALITY 


Determination of trend 

46.1 Itis an essential part of the concept of trend that the movement over fairly long 
periods is smooth. This means that we can represent the trend component, at least 
locally, by a polynomial in the time element t. Thus, given the series и, we may, 
in the first instance, seek for some polynomial 

u, = dotata, ... bat? (46.1) 

which will give an account of the trend movement. By taking p great enough we can, 
of course, obtain as close a representation as we like to a finite series; and how large 
we take p is a matter for decision in particular cases. 


We need not restrict ourselves to polynomials, although they are the most con- 
venient mathematically. Any suitable function of the time can be taken, though 
we should naturally choose one which itself moved in a trend-like way. Growth 
curves, for example, may be represented by exponential functions, and population 
curves like that of Table 45.5 are sometimes represented by the logistic curve of type 

k 
sts 14e iud 

46.2 If a polynomial is fitted to the whole series by least squares, it evidently 
gives the curvilinear regression line of и, on the variable f. It is, however, clear that 
to obtain a satisfactory trend-curve for data such as that of Table 45.4 (sheep population), 
we should have to take a polynomial of rather high order or a somewhat complicated 
more general function. This may appear somewhat artificial and in any case the 
coefficients of such a polynomial, being based on high-order moments, would be very 
unstable from the sampling viewpoint. A more practical objection, though by no 
means an unimportant one, is that if we add another term to the series, as for example 
if we are keeping an annual series up to date from year to year, the work of fitting has 
to be done afresh each time. Moreover, the trend-line may be affected throughout 
їз length. When, therefore, the series has no very obvious trend it is more convenient 
to use the simpler methods described below. 


Moving averages 

46.3 Ап alternative to finding a polynomial which will represent the whole series 
is to determine a polynomial which will represent a part of it, and to use different 
polynomials for different parts. The simplest method, and one which forms the basis 
of the majority of methods of trend-fitting, is to take the first n terms (n being chosen 
at will), fit a polynomial of degree p, not greater than n—1, to them, and use that poly- 
nomial to determine the value in the middle of its range; then to repeat the operation 

366 
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with the n terms from the second to the (n-- 1)th, and so on, moving on one term at 
each stage. Unless other considerations require it, we take л to be odd, so that the 
middle point of the range corresponds in time to a value which is actually observed. 
Otherwise the middle point falls half-way between two observed values, or we have 
to use some value of the fitted polynomial other than the middle point, which results 
in a loss of useful symmetry. 


46.4 Suppose, then, that the number of terms is chosen to be odd and is denoted 


by 2m+1. Without loss of generality we may denote the terms by и Uma» +++» 
Up) ++ +4 Um- Uy If we choose to fit to them a polynomial of degree р we may, in the 
usual way, determine the coefficients by least squares, i.e. solve the equations 
x О аат ors xaT О MIRI UIS 
dj t=—m 
which will give us equations typified by 
Etu at-a XDSU—...—aQ Ўй = 0). (46.4) 


The sums E 0? are functions of m only. ‘Thus, if we solve (46.4) for a) we shall 

find an equation of the form 
ар = Cot C Unt Colm oss Cama (46.5) 
where the с'ѕ depend on m and p, but not on the ш. 

Now и assumes the value a, at t = 0 and hence this value, as given by (46.5), 
is the value we require for the polynomial. As we see, this is equivalent to a weighted 
average of the observed values, the weights being independent of which part of the 
series is taken. Thus our process of fitting a trend-line consists of determining the 
constants c (which depend on т and р and therefore give us a twofold element of 
choice) and then calculating, for each consecutive set of (2m+1) terms in the series, 
a value given by (46.5). If the terms are Us . . . , sms the calculated value will 
correspond to ? = т+х. A supplementary procedure is necessary to give values 
corresponding to the m terms at the beginning and the m terms at the end. 


Example 46.1 
Suppose we have a series and wish to fit a curve which best approximates to sets 
of seven points; and suppose we regard a cubic as providing a satisfactory approxima- 
tion. What are the weights of the moving average? 
We have m = 3 and p = 3, and our polynomial is 
и, = ag a, t4- a5 - a3. 
Taking our origin at 2 = 0, we find, for equations (46.4), in virtue of the fact that 
Zt = 0 for odd k, 
Zu = 7a +284, 
Ziu = 28a, 4- 196a, 
E tu = 28a, 3- 196a; 
Х 13и = 196a, +1588a, 


(46.6) 


ll 
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giving, for ay, 
а = 4; (7Zu-ZEt*u) 
= gy {—2u_g+3u_.+ 6u. ,-- 7up+ 6u, + 3u4 — 2u;). 
We may write this conveniently as 
зт [—2, 3, 6, 7, 6, 3, —2] 
or, when symmetrical formulae are used, as in the present case, by 
3 [-2, 3, 6, 7], 
denoting the middle term by heavy type. 
To take a simple illustration. Suppose the series is given by the following values: 

064228 1452674 Pie 8 ir 810) 

u:0 1 8 27 64 125 216 343 512 729 

We have, for the trend-value at t — 4, 

а =ar {(—2х0)+(3х1)+(6х8)+ ... —(2x216)} 
= 4.567 = 27. 

The trend-value is equal to the actual value of the series, and this obviously must 

be so when we note that we are fitting a cubic to the series 
u, = (t-1). 

It will be observed that in this example we should have obtained the same value 
for a, if we fitted quadratics instead of cubics, for а, does not depend on аз in equations 
(46.6); and generally the case р odd includes the case of the next lowest (even) value 
of р, so that we need not give separate formulae for even p. 


46.5 Writing a,[k] for the value of a, calculated in the above manner for an 
average of k successive terms, we find the following formulae up to р = 5. The 
reader may care to verify them for himself as an exercise. It will be evident that the 
sum of coefficients in any formula is unity; for if we apply the trend to a set of values 
all equal to unity, the result must be unity. 


Quadratic and Cubic 
E] I5 12, 17 ) 
E] 4-2 3, 6, 7] 
[9] si[-21, E 39, 54, 59] 
[11] ii.[—36, 9, 44, 69, 84, 89] 
[13] xis[-11, 0, 9, 16, 21, 24, 25] (46.7) 
[15] mss [7 78, —13, 42, 87, 122, 147, 162, 167] 
[17] зїз[—21, —6, 7, 18, 27, 34, 39, 42, 43] 
[19] 4&1 [— 136, —51, 24, 89, 144, 189, 224, 249, 264, 269] 
[21] scss[—171, —76, 9, 84, 149, 204, 249, 284, 309, 324, 329]j 
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Quartic and Quintic 

U] x5 —30, 75, 131] 
[9] _ +їъ[15, —55, 30, 135, 179] 
[11] ay [18, —45, —10, 60, 120, 143] 
[13] 4 [110, —198, —135, 110, 390, 600, 677] 
[15] zetee (2145, —2860, —2937, —165, 3755, 7500, 10125, 11063] (46.8) 
[17] aise [195, —195, —260, —117, 135, 415, 660, 825, 883] 
[19] =: [340, —255, —420, —290, 18, 405, 790, 1110, 1320, 1393] 
[21] sxcors [11628, —6460, —13005, —11220, — 3940, 6378, 

17655, 28190, 36660, 42120, 44003] 


- 


46.6 It is sometimes more convenient to express these formulae in terms of the 
differences of the series A’u, where 
Au, = ugue (46.9) 
Thus, for example, 
dr [72, 3, 6, 7, 6, 3, –2] = u,- 44 (AS49054 2A%u,_5 (46.10) 
which exhibits at once the fact that our fit is exact for a cubic, i.e. as far as fourth 
and higher differences. Or we may equally well represent the process as a moving 
average of the differences, which provides a convenient method of calculation when 
differences are smaller than lower-order differences or the original values of the series. 
For instance 
211-2, 3, 6, 7] = utar {2А%и,-,+3А%и,-„—3А%и,_,—2Ази} (46.11) 
щ+зїг[2, 3, -3, 2]4*u., (46.12) 
= ш 1 [2, 5, 2]A*u,.s. (46.13) 
We can obviously represent such formulae in a variety of different ways. (46.13) is 
particularly convenient because it gives us the residuals immediately. 


Example 46.2 
Suppose we wish to represent one of the other formulae in (46.7) in this manner, 
say the quintic fitted to 11 points: 
41118, —45, —10, 60, 120, 143]. 
We first of all subtract unity from the middle term to give 


15118, —45, —10, 60, 120, —286]. (46.14) 

The sum of coefficients must now be zero. Denote by U a shift operator such that 
Uu, = ща. (46.15) 

Then А = U-1. (46.16) 


The moving average (46.14) may, apart from the divisor 429, be written 
18—45 U — 10 U?-- 60 U8 + 120 U*— 286 U5 + 120 US + 60 U* — 10 U8—45 U9 +18 (79, 
We know that this is exact as far as fifth differences and consequently A* = (U—1)* 
must be a factor. We find 
(U — 1)* (18 U4 + 63 U2 +98 U? +63 U+ 18). 
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The original process may then (since 7—1 = A) be written 
икт» [18, 63, 98, 63, 18]A* u,_;. (46.17) 
46.7 The following аге the formulae for (46.7) and (46.8) in terms of differences: 
Quadratic and Cubic 
[5] m= [1]Atu 
[] щ—+[2, 5, 2A*u, 
[9] и,—+1{т[21, 70, 115, 70, 21]A‘u, 
[11] &—+т}%[36, 135, 280, 385]A*u, 
[13] u:—ris[11, 44, 101, 168, 210]A‘u, 
[15] и»—тт»ў[78, 325, 790, 1435, 2100, 2478]A'u, 
[17] w,—5i5[21, 90, 227, 434, 686, 924, 1050]A‘u, 
[19] шо ттєт [136, 595, 1540, 3045, 5040, 7266, 9240, 10230]A*u, 
[21] и зо [171, 760, 2005, 4060, 6930, 10416, 14070, 17160, 
18645]A*u,j 


(46.18) 


Quartic and Quintic 

[7] ttar ША 
[9] +225 [3, 7]А%и, 
[11] us+rts[18, 63, 98]A%u, 
[13] wu, +5757 [110, 462, 987, 1302]A*u, 
[15] ustaetes (2145, 10010, 24948, 42273, 51198]A*u; (46.19) 
[17] ut х5 [195, 975, 2665, 5148, 7623, 8778] A%u, 
[19] wyo+ +. [340, 1785, 5190, 10875, 18018, 24453, 27258] A%u, 
[21] t1+-secors (11628, 63308, 192423, 426258, 759003, 

1135134, 1450449, 1581294]A* u, 


46.8 Several methods have been proposed to simplify the arithmetic of fitting 
a trend-line by moving averages, the large numbers in some of the expressions in (46.7) 
and (46.8) involving considerable labour in straightforward application. 'The simplest, 
perhaps, is that of iterated averages. 

Suppose we take an average of sets of four with equal weights—a very simple 
process—and then another average of the same kind of that average. If the primary 
series is и, the result of the first operation will be to give a series typified by 

Dy = (и tug tug t us) 
and that of the second operation to give 
w, = (v+ 0+ 0+0) 


= ya (и, +2и, +3ug + 4u, + 3u; + 2u, 4- u;). (46.20) 
We may write this symbolically as 
GIL 1, 1, 1}* = 51, 2, 3, 4] (46.21) 


(*) Kendall (1961a) has shown that the numbers in square brackets on the right in these formulae 
tend, as л and р increase, to the ordinates of a normal frequency function. 
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or, reserving the symbol ША for a simple arithmetic mean of terms, аз 


Te]? = yell, 2, 3, 4]. (46.22) 
Now compare the weights of the average derived in Example 46.1 for fitting a 
cubic to seven points. Reduced to unit divisors, we have for the weights of the latter 
—0-0952, 0-1429, 0-2857, 0-3333 
and for the weights of (46.20) 
0-0625, 0-1250, 0-1875, 0-2500. 
The two are not identical, but they follow the same sort of course and it might be 
possible to regard the latter as an approximation to the former. (We shall derive 
better approximations presently, but this will serve for purposes of illustration.) Now 
the iterated summation resulting in (46.20) is much easier to carry out than the single 
weighted averaging process of Example 46.1. Generally, if we can find averages 
with simple integral weights, preferably unity, which will, in conjunction, give approxi- 
mations to the more complicated weights of a single average, it is usually easier to use 
the iteration process. 


46.9 In the notation of finite differences, write 
би, = u,,4 Ugy (46.23) 
We have, for the second “ central" difference ёи, 
Ou, = (шаш) — (44 45) 
= (U-24+U-)u, (46.24) 


U = exp (2i4), (46.25) 


Writing 


we find, symbolically, 
ô? = exp (2i6)—2+exp (—2i¢) 


= —4sin? 4, (46.26) 
Then 
X u = 5 Uiu, 
j=—m j--m 
= (1+2 5 cos 2%), 
ja 


since the terms in sin 2j¢ vanish, 
sin (2m + Dee 


- dnd (46.27) 
Thus 
1 _ Ізіп Аф 
g lo — ksin$ * 
- iD Jai game 3) sint d— . Ju 
k*-1 (k2— 1) (&2— 3?) 
= ао) н (4628) 
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This interesting formula gives the arithmetic average in terms of the middle term 
uy and its central differences. 

If now our series is approximately represented by a cubic, so that fourth differences 
vanish, we have, taking шу as the middle ы 
k?— 

24 
and this equation will in any case be true up to third differences. Similarly, for two 
iterated averages we have, to the same order, 


ii d [eus = ut si (KHR — 2), (46.30) 


Ho = tT gu, (46.29) 


and so on. We will use these results to derive two formulae in very general use by 
actuaries for “ graduating ” a series, a process which is very similar to that of fitting a 
trend-line. 


Example 46.3 Spencer's 15-point formula 
Consider three successive averages with equal weights 
ao EL E Du = tort ai (424-424 52—3)0" 
= ust $ó* uy. 
Multiplying by 1—2ó?, we then have, to third differences, 
Uy = ds [T [5] (1— 19?) 
Substituting for 6? the formula [1, —2, 1], as given by (46.24), we find 
= sio [5] [-9, 22, —9]. 
Now without affecting the order of the approximation we may add factors in ó* or 
higher central differences, and can simplify the numerical coefficients to some extent. 
Let us add to the factor [—9, 22, —9] a term —3ó* = [—3, 12, —18, 12, —3]. 
The result is [—3, 3, 4, 3, —3], giving 
ио = stolt] [5] [—3, 3, 4, 3, -3]. (46.31) 
This is Spencer's 15-point formula. It covers sets of 15 consecutive terms, the 


weights in full being 
sis[73, —6, —5, 3, 21, 46, 67, 74]. (46.32) 


Example 46.4 — Spencer's 21-point formula 
In a similar way we find 


ҮБҮ ГЛ = 1446 
giving, to third differences, 
ио = тїв[5]*[7][—% 9, —4]uo. 
We now add to the factor [—4, 9, —4] the expression 
— 304—486 = [—3, 12, —18, 12, —3]+[-4, 3, —74, 10, 7, 3, - И 


giving 

= ys DI U1[- 5 0, 4 1] 

= ssol5}*[7][-1, 0, 1, 2). (46.33) 
This is Spencer’s 21-point formula. 
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46.10 Simplicity of calculation, however, is not nowadays as important as it once 
was, and for certain purposes these approximations are to be avoided. ‘The original 
formulae (46.7) and (46.8) provide lines of closest fit for assigned extent and degree 
of polynomial. It follows that for given m and p the sum of squares of weighting 
coefficients is a minimum. In fact, if we apply the moving average to a series con- 
sisting of a polynomial trend of degree р plus а random residual e, (which has the same 
distribution for all f) the residual sum of squares is given by 

PICCOLI ELS 
so that the expected variance of residuals is 
Ed (46.34) 
j=0 
which is thus a minimum within the class of weights reproducing the same degree of 
polynomial for assigned m. 


End-effects 

46.11 The moving-average method as we have expounded it has obvious properties 
of symmetry. It also has the drawback of failing to provide trend-values for the 
first m and the last m terms of the series. As a rule it is not a great loss to have to 
forgo the values at the beginning, but the absence of trend-values at the end is a 
serious handicap, especially when we want to extrapolate into the future. We can 
fill the gap in various ways, recognizing that trend values at the end may not be so 
reliable as those in the middle. The method illustrated in the following example 
is probably as simple as any. 


Example 46.5 
Consider again the formula used in Example 46.1, 
211—2, 3, 6, 7]. 
We obtained this by fitting a cubic, but used that cubic only to determine the middle 
point of a set of seven. There is no reason why we should not also use it to determine 
the last three points of the end-set of seven. To do so, however, we need to solve 
equations (46.6) for a, a», аз аз well as for ao. 
We find 
a, = 3353 (397 X tu—49 X t*u), 
a, = d; (-4Eu4Xttu), 
аз = 41, (—7 X tu Tt uj. 
From these results substituted in the polynomial we find, in an obvious notation, 
t 


шщ = 111—2, 3, 6, 7, 6, 3, -2) [25% 767, —58, 0, 58, 67, -22] 


Us t 
+ 0, —3, —4 —3, 0, S]+zg{-1, 1, 1, 0, —1, —1, 1]. (46.35) 


Thus, for example, with t = 1, 2, 3, the expressions reduce to 
u = 211, —4, 2, 12, 19, 16, —4], (46.36) 
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u, = l4, —7, —4, 6, 16, 19, 8], (46.37) 
us = 4s[—2, 4, 1, —4, —4, 8, 39]. (46.38) 
For example, if our last seven terms were 0, 1, 8, 27, 64, 125, 216, the fourth trend term 
(the one following the middle term, where t = 0) would be 
as [0 —4+ 16 + 324+ 1216 4-2000 — 864] = 64. 
The next is 
ч+[0—7— 32+ 162 + 1024 +2375 + 1728] = 125 
and the reader can verify that the last is 216. These results, of course, are exact 
because we are fitting a cubic to a cubic. 

One interesting phenomenon is to be noted here. In (46.35) to (46.38), as in the 
formula for иу, the coefficients sum to unity. But they become more and more unequal 
so that their sum of squares, in general, increases as we go from u to из. ‘The variance 
of any random residual term therefore increases, a reflection of the fact that, as we 
depart more from the centre of the range which we have fitted, the polynomials become 
less “ reliable.” The sums of squares of coefficients in this case are 

ио: 0:3333, 11: 0-4524, uy: 0:4524, из: 0-9286. 
The increase in sum of squares is not, however, monotonic. Some further results 
are given in Exercises 46.1 and 46.15. 

The coefficients and their sums of squares have been tabulated by Cowden (1962) 

for p<5, n«25. 


46.12 Results of the foregoing kind may also very conveniently be obtained by 
the use of orthogonal polynomials (28.18, Vol. 2). If we put n = 2m--1 we find, for 
the first four polynomials, 


$0)-1 
$i (£) = At 
a(t) = Ay (f  3m(m-- 1)) (46.39) 


Galt) = 2, (P— L(m*--3m- 1) 
d. (t) = 2, (t1— 1t* (6m? + 6m— 5) + (m — 1) (m) (m - 1) (m - 2) 
If now, for example, 
u, = bot by deb. dar ba ds (46.40) 
we have at once 5 
и, = by— 12. т(т+1)Ь,. (46.41) 
Moreover, from the tables of the function the values of 2 сап be obtained and 
also the values У Ф. We have, in virtue of the orthogonality property, 


иф Zu, 

bo = 5 qeu (46.42) 
Eu 

DE t (46.43) 


For example, with a cubic fitted to sets of 7, m = 3, and we have, from (46.42), (46.43) 
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and the tables, 
bo = 3и, by = 4nE(ü-9u.e 
Hence, from (46.40), 
u = Eu - ) (Et'uj-4Eu) 
= 4 7 Eu,-Xt'uj, 
as in Example 46.1. A similar use of the polynomials will give us the end-values 
discussed in 46.11. 


46.13 We have as yet said nothing about criteria by which we should decide the 
extent of a moving average, 2m--1, or the degree of the polynomial, p, on which it 
should be based. There are, in fact, no simple criteria of this kind. One important 
reason for this is that a great deal depends on why we are interested in isolating the 
trend, or, to put the matter in a rather different way, what is the underlying model 
which determines our dissection of the series. If we are concerned chiefly to describe 
a broad trend in the data, and are not particularly interested in short-term and residual 
effects, one type of moving average may be adequate. But if we want to remove the 
trend in order to study the residuals, such a type may be quite inappropriate; and 
indeed, for some purposes, we may well question whether it is safe to eliminate trend 
by a moving average at all. Before, then, we can adequately discuss the choice of 
a suitable method of finding the trend we must consider the effect of our methods 
on residual variation. 


The effect of trend-elimination by moving averages on other components 

46.14 In Table 46.1 we have applied the Spencer 21-point formula to an artificial 
series obtained by adding a random element to a cubic. (We have chosen this formula 
rather than one of (46.7) because the effect of successive simple averages can also be 
seen.) Specifically, 

и, = (t—26) + 3, (t7 26) + rolt — 26)" + er (46.44) 
The component e, was taken from tables of random numbers and consists of samples 
from a population in which all integral values from 0 to 99 are equally frequent. The 
various columns of the table illustrate the process of fitting, and we may note in passing 
that for a series as short as this it is convenient to leave the more difficult summations 
to the last as there are substantially fewer of them. 

Now we know that the Spencer formula will fit a cubic exactly, so that when we 
subtract the trend from the original series we ought to eliminate the systematic con- 
stituent entirely and be left with our random component, except in so far as we have 
rounded off the systematic element to the nearest unit. A comparison of columns 
(2) and (9) in Table 46.1, remembering that the latter includes an element 49-5 equal 
to the mean of the random component, shows that we do not do so. ‘The reason is 
not far to seek. ‘The moving average has acted on the random element itself and 
determined a “ trend-line " in it. 

The results of applying the Spencer 21-point formula to the random element £; 
are shown in column (11). We should expect that if the method were perfect the values 
in this column would be 49-5, the mean of £p apart from irregular sampling effects; 

вв 
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Table 46.1—Series given by equation (46.44) with trend-line determined 
by a Spencer 21-point formula 


0| Q з | @ 6 (6) (0) (10) (11) | 
cubic | | | | | | Deviation | Graduation 

t term “ ш | Dlw | [5] 6 | [7] (6) ur—(9) | of e alone 

1|-119| 23 | —96 | 

2|-105| 15 | —90 

3| -92| 75 | —17 | —246 | 

4| -80| 48 | —32 | —209 | 

5| -70| 59 | -11 | -87 | -572 | 

6| —60 1 —59 | —42| —241| 

7| 51| 83 32 12| 162 

8| -44| 72 28 85| 413| 2233 

9 |3771 "59 22 194 | 670| 3801 | 

10 | -31| 93 62 164| 844| 5120 

11 | -26| 76 50 | 215| 957| 5984 | 14352 | 41 9 67 

12 | -22| 24 2 | 186| 996| 6642 | 15470 | 44 | -42 66 

13 | -18| 97 | 79 198 | 1078 | 7041 | 15815 | 45 34 63 

14 | -15 8 | -7| 233| 1026| 7145 | 15676 | 45 | -52 60 

15 | —12| 86 | 74 | 246) 1071| 7038 | 14978 | 43 31 55 

16 | -10| 95 85 | 163| 1069 6934 | 14166 | 40 45 51 

17| -8| 23 | 15 | 231| 984| 6709 | 13379 | 38 | —23 47 

18 -7 3 -4 196 | 850 6535 | 12703 36 | —40 43 

19 -6| 67 61 | 112| 892 6408 | 12169 | 35 26 40 

20 -5| 44 | 39 148 | 853 6363 | 12102 | 35 4 39 

21 -4 5 1 205| 852 6446 | 12279 | 35 | -34 | 39 

22 -3| 54 51 192) 944 6611 | 12676 | 36 | 15 39 

23 -2| 55 53 195 | 1024) 6769 | 13228 | 38 15 40 

24 —2| 50 | 48) 204| 1031| 7052 | 13857 | 40 8 41 

25 -1| 43 42 | 228| 1015| 7353 | 14508 | 41 1 42 

26 0| 10 10 | 212| 1050| 7610 | 15120 | 43 | -33 43 

27 | 1. | "78 75 176 | 1136 | 7923 | 15634 | 45 30 44 

28 | 2.12235 37 | 230| 1153| 8249 | 16251 46 -9 44 

29 4 8 12 | 290| 1201| 8607 | 17002 | 49 | —37 45 

30 6| 90 96 | 245) 1337| 9019 | 17717 | 51 45 44 

31 9| 61 70 | 260) 1357) 9424 | 18499 | 53 17 44 

Зозе 245548 30 | 312) 1373 | 9870 | 19307 55 | —25 43 

33 | 15] 37 | 52] 250| 1462) 10429 | 20159 | 58 -6 42 

34 20| 44 | 64 | 306) 1541| 10989 | 21133 60 4 41 

35 24| 10 | 34| 334| 1599| 11679 | 22417 | 64 | —30 39 

36 30| 96 | 126 | 339| 1760 | 12539 | 23797 | 68 58 38 

37 36| 22 58 | 370, 1897 | 13529 | 25737 | 74 | —16 37 

38 44| 13 57 | 411 | 2047| 14699 | 27955 80 | —23 36 

39 52| 43 95 | 443 | 2233| 16060 | 30456 | 87 8 35 

40 61| 14 75 | 484| 2452 | 17570 | 33334 | 95 | —20 34 

41 A| 87 158 | 525 | 2711 | 19353 | 36716 | 105 53 34 

42 83| 16 99 | 589 | 2960 | 21394 

43 95 3 98 | 670) 3270 | 23690 

44 | 109| 50 159 | 692| 3680) 26255 

45| 124| 32 156 | 794| 4088 | 

46 | 140) 40 180 | 935 | 4529 | 

4] | 158| 43 201 | 997 | 5017 | 

48| 177| 62 239 | 1111 

49 | 198| 23 221 | 1180| 

50 | 240| 50 270 | | 

51 244 5 249 | 
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but not only do the observed values deviate from this mean, they do so systematically, 
the values having a small oscillatory movement which is shown as part of the trend. 


46.15 "This effect is vital, particularly if we are eliminating trend so as to concentrate 
attention on oscillations. We proceed to examine it more closely. 
Suppose that we have a series composed of the sum of three parts, a trend x, (f), 
an oscillatory term x,(¢), and a random element x,(f), so that 
Uy = Xp XX (46.45) 
If we determine the trend by a moving average, denoted by an operation T, then 
clearly 
Tu, = Tx, Tx, 4- Tx. (46.46) 
Let us now suppose that our method of determining trend is perfect in the sense that 
Tx, = x, Then, on subtracting (46.46) from (46.45) to eliminate trend, we find 
— Ти, = х Tz4 4-3 — Тху. (46.47) 
The point of present interest is that the terms Tx, and T», in (46.47) may distort 
the genuinely oscillatory parts of the residual series and induce spurious oscillatory 
movements. 


46.16 Consider the simple case when x; is a sine term, sin («+ At), t being integral. 
Since 
sin at a 


sin {a+ 4(k+1)}, (46.48) 


a simple moving average of k Peper terms will result in a sine series of the same 
period and phase as the original, but with the amplitude reduced by the factor 
1 sin $kå 
k sini ot 

Iteration g times will reduce the amplitude by the gth power of this factor. 

Thus the term Tx, will be small if А is large, q is large, or if {АЛ is a multiple of 
л, that is, if the extent of the moving average is a period of the oscillation. But if A 
is small and Ал is small the amplitude is reduced very little and x,— Tx, will largely 
disappear, ie. the moving average will partially obliterate the term іп xy. In this 
case, kA being small, the extent of the moving average is small compared with the 
period of the harmonic term, that is to say the oscillation is a slow one. This result 
is what we should expect. А slow oscillation is treated as a trend by the moving average 
and eliminated accordingly. Generally, the moving average will emphasize the shorter 
oscillations at the expense of the longer ones. Furthermore, if the extent of the average 
is slightly greater than the period, the term (46.49) may have a negative sign, and 
consequently the difference from the trend may somewhat exaggerate the true oscilla- 
tions. 

It is not so easy to exhibit the precise effect of the moving average when the weights 
are unequal and the terms are not harmonic, but evidently the same kind of situation 
is apt to arise. 


k 
У sin (x-- At) = 
tel 
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46.17 Now consider the effect of a simple moving average (that is, one with equal 
weights) on the residual element хз, which we will suppose to be a random element 
e, with variance v. For the term Tx, we have 


T: 2 bi 46.50 
х: = | aa Et+j (46. ) 


where [}Ё] is the greatest integer which does not exceed łk. Consecutive values of 
е, are independent, but consecutive values of Tx, аге not; for Тх, (а) and Тх, (Б) have 
k— (a—b) values of ғ in common and are correlated if a—b<k. Thus the series Тиз 
will be much smoother than из, and if we proceed to further averagings will become 
smoother still. We have had an example of this effect in Table 46.1 and shall meet 
further examples below. 


46.18 The effect of taking a moving average of a random series will then be to 
generate an oscillatory series, provided that the weights are such as to give a positive 
correlation between successive members of the generated series, a condition which is 
always realized in moving averages employed for trend-fitting. We shall call this 
the Slutzky-Yule effect, after the two statisticians who (independently) studied it in 
detail. 

The generated series is not regular in the cyclical sense, that is to say its peaks and 
troughs do not recur at equal intervals of time, and the amplitudes of the oscillations 
vary considerably (although, in Chapter 49, we shall prove a theorem of Slutzky's 
showing that certain kinds of iterated average generate a sine curve). Nevertheless 
such oscillations present a striking resemblance to the kind of movement which is 
found in practice, particularly in economic time-series, and we shall consider them in 
more detail later. For our present purposes we require to consider how far the process 
of trend-elimination itself may generate such effects, in order to be sure that oscillatory 
movements in a trend-free series have not been put there, so to speak, by our own 
arithmetical processes. 


46.19 For this purpose we shall consider the period and variance of a series gener- 
ated by the Slutzky-Yule effect. 

Since the peaks and troughs do not recur at equal intervals there is no quantity 
which we can conveniently call the length of the oscillation. There will, in fact, 
be a distribution of lengths. We may define as the mean length either the mean 
period from peak to peak, or that from trough to trough; but this raises some difficulties 
as to whether we are prepared to admit as periods small ripples on the main undulation. 

Recognizing its somewhat arbitrary character, we shall take as our measure of 
oscillatory length the mean distance between “ upcrosses,” that is to say the mean 
distance between points where the series changes sign from negative to positive or 
“ crosses the u-axis." Suppose the series is generated by a moving average with 
weights а, . . . , a, of a random variable which is normally distributed with variance v. 
'Then the probability that 


k 
m= Хао (46.51) 
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and 
k 
Ups = È 0564470, (46.52) 
j-1 


i.e. that the generated series changes sign from negative to positive, is the proportional 
frequency of 


1 1 kil A 
dF = Ол) ехр {-%® ES 4) агу... дерд (46.53) 


between the hyperplanes Ха ғ; = 0 and Ха; ғ; = 0. This is equal to the angle 0 
between these two planes, which is given by 


k 2 
соѕ0 = 2 ааа f S. aj. (46.54) 


0 is the expected number of upcrosses per unit time, and thus 2z/0 is approximately 
the mean distance between upcrosses. 


46.20 In a similar way, the probability that 


ика uy <0 (46.55) 
uy — uy > 0, (46.56) 
that is, that up is a peak of the series, is the angle between the two hyperplanes 
k k 
E ajej- E geI (46.57) 
j=1 j=1 
k E 
E aje- E ajej =O (46.58) 
ј=1 j=l 


and is given by 
cos 0, = (4a= 41)a + (@a— as) (а-а)+ ... au (9—9), (46.59) 
ай+(а,—а,)%+...+а 
Thus the mean distance between peaks is 2z/0,. The same formula obviously applies 
to mean distance between troughs. 


46.21 If we wish to exclude “ ripples ” of a certain length d from consideration, 
we may enquire for the probability that (46.57) and (46.58) are satisfied in conjunction 
with 

Uy yug. (46.60) 
"This is evidently the area cut off on the unit sphere by the three planes (46.57), (46.58) 
and 
Xa,c,-X аа = 0. (46.61) 
If the angles between the planes are A, В and C, this area is А+В+С—2л = s, 
say. The mean length between peaks, ripples excepted, is then 4z/0,. 


Example 46.6 
In Table 46.2 we show 480 terms of a series of random numbers which can take 
integral values from 0 to 19, together with a moving sum of fives of a moving sum of 


threes. Fig. 46.1 shows a portion of the derived series graphically. "There are 474 
terms of the smoothed series. 
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Table 46.2—Series of 480 terms of a rectangular random series є anda 

[5] [3] smoothing 5 

Т 7 
ele |e П ЕЗ cel eae ics КЛ Е З Е ЕЯ 

Жек Жа T 
1| 3 ві | 17 |197 |161) 9 | 140} 241) 1 | 99 17 | 196} 401 | 17 | 191 
2| 15 82| 11 | 2000162) 14 | 122] 242) 17 | во 15 | 209} 402| 3 | 205 
3| 15 83) 6 | 206] 163) 1 | 117} 243) 0 | 75 13 | 194 | 403 | 18 | 198 
*| 8 | 164 17 | 215 п | 9442 IES 10 | 179 |404 14 |192 
5| 19 | 147] 85| 18 | 2281165 1 | 98|245| 0 | 94 18 | 151] 405 | 14 | 191 
$| 1 | 143] 86| 19 | 2300166, 8 | 93|245 | 6 | 1241326) 0 | 133] 406 | 13 | 197 
2| 3 | 145] 87| 15 | 220] 167 | 2 | 106] 247| 17 | 1691327) 9 | 112} 407| 5 | 205 
8| 12 | 165] 88| 13 | 198] 168 | 18 | 103] 248) 16 | 195 |328 8 | 108 |408) 19 | 204 
9| 19 | 175] 89) в |175|169| 1 | 121] 249) 17 | 204 |329) 3 | 105 | 409) 17 | 202 
10| 13 | 196} 90) 10 | 1590120) 7 | 117 | 250) 15 |191 |330) 9 | 111] 410) 18 | 192 
11| 16 | 191] 91| 14 | 158 | 171 |. 9 | 127] 251 175 | 331) 12 | 107] 411] 5 | 174 
12| 4 | 178] 92) 5 | 158] 172 | 13 | 120] 252| 14 | 150|332| 3 |101|412| 7 | 140 
13| 17 | 159] 93| 12 | 159 | 173 | 2 | 137 |253) 9 | 1449333) 8 | 854413 | 15 | 107 
14| 8 | 150 18 | 153 |174) 16 | 139 | 254) 11 | 131]|334| 5 | 77} 414) 1 | 86 
15| 6 |134| 95| 1 |145 0175) 1 |145 |255) 3 |135|335 | 1 | 751415 4 | 66 
16| 15 |118 96| 14 | 124| 176 | 17 | 142|256 | 15 | 125 1336) 2 | 93|416| 2 | 58 
17| 3 |10]| 97| 8 |112| 177 | 13 | 145] 257) 1 |138| 337 | 13 | 107] 407 |. 3 | 50 
18| 3 | ss| 98| 1 |108| 178 |. 0 | 149] 258 | 14 | 142|338 |. 6 |134|418|. 2 | 62 
19| 7 | s2|.99| 5 |123| 179 | 15 | 157] 259| 9 | 162 |339 | 15 | 151 | 419 | 10 | 78 
20| 4 |100|100| 13 |1531|180| 7 |166|260| 13 |166|340| 7 |161|420| 0 | 105 
21| 5 |126| 102 | 11 | 150] 181 | 16 | 167] 261) 11 | 182] 341| 13 | 1060|421 | 9 | 126 
22| 14 | 150 | 102 | 14 | 151] 182 | 16 | 171] 262| 15 | 190] 342 | 13 | 162] 422 | 16 | 146 
23| 15 | 157 | 103 | 6 | 140| 183 |. 7 | 1691203) 8 |203|343| 5 | 155 | 423 | 9 | 152 
24| 10 | 150 | 104 | 13 | 120] 1 6 |174| 2 17 | 210] 344 | 15 | 153] 424 | 11 | 144 
25 153 | 105 | 1 | 119 |185) 13 | 168 |265) 19 |214| 345 | 10 | 162 | 425 | 12 | 124 
26| 10 |156 |106) 4 |120|186 | 17 | 170|266 | 10 | 211 |346) 3 | 174|426 | 2 | 106 
27| 13 | 165] 107 | 13 | 133 |187, f4 | 170]267 | 17 | 158 | 347 | 18 | 176 | 427 | 3 | 106 
28 | 14 | 175 | 108 | 13 | 147 |188, 2 |159|268 | 11 | 163|348 | 19 | 177| 428 | 9 | 119 
29| 15 |108 |109) 8 |172| 189 15 269| 9 |146 |349) 8 | 1741429) 6 | 139 
301 8 |160|110| 12 | 186] 190 | 9 | 139 |270) 1 |153|350| 5 | 173|430 | 17 | 159 
31) 10 | 154 | 111 | 12 | 195 | 192 |. 1 |135|271 | 11 | 154] 351 | 16 | 159 | 431 | 15 | 174 
32| 1 | 156]112) 19 |204|192| 15 | 151 | 272) 17 | 162|352 | 7 | 157] 432) 5 | 179 
33| 18 | 154 | 113 | 13 |203| 193 | 16 | 147|273 | 17 | 155 | 353 | 16 | 157 | 433 | 14 | 172 
34| 17 |165| 94 | 11 | 1854 | 194 | 6 274 | 4 |154|354 | 8 | 169] 434 14 | 155 
35| 4 11641115 | 18 | 156] 195 | 10 |132|]275 |. 8 | 137 |355) 6 | 168|435| 9 | 133 
36| 10 | 1590116) 2 |135 |196) 5 |128|276 | 2 |134|356 | 15 | 165 | 436 | 8 | 107 
37| 16 |138 |117) 4 |121| 197 | 7 | 1220277) 18 | 131 | 357 | 19 | 153 | 437 |. 3 | 75 
38| 2 |137| 118 | 10 | 111] 198) 8 |126|2758 | 8 | 1720358) 4 |150|438| 1 | 53 
39| 13 | mi]|1 | 8 |116] 199 | 18 | 120|279 | 9 |184|359 | 5 |133|439| 3 | 55 
40 140 | 120 | 10 | 131 [200 O | 1219 280| 19 | 185 |360 9 |120|440| 1 | 72 
41 14 |135 |121) 3 |145 |201) 7 | 1054281) 17 | 167] 361| 12 | 117] 441) 5 | 94 
42 146 | 122 | 16 | 156 | 202 9 | 99) 282) 4 | 1500362 2 | 127] 442/ 16 | 96 
43| 16 | 1417123 | 12 | 173] 203} 5 | 93|283| 8 | 123] 363| 11 | 118] 443| 8 | 91 
44| 3 |139| 124 | 8 |175|204| 3 | 95] 28%) 5 | 115] 364| 12 | 112] 444 | 2 | 78 
45 | 10 |117 |125) 19 | 160} 205| 12 | 914285) 6 | 131] 365) 5 105 |445) 0 | 75 
46| 12 | 961126) 11 | 145 |206) 4 | 931286) 7 | 168 |366 0 | 10014436) 2 | 85 
47| o | 75] 127| 1 | 129] 207) 2 | 97] 287| 19 | 186] 367| 12 | 84] 447| 7 | 109 
48| 3 | és|128| 4 | 123] 208| 11 | 97|288| 19 | 196|368| 6 | 881448 | 17 | 124 
49| 2 | 61] 129| 16 | 108} 209) 6 | 107] 289| 16 | 188 |369) 2 | 96] 449) 12 | 124 
50} 3 | z|10| 3 |115 0210) 7 | 115] 290) 2 | 180]370| 4 | 1041450) 5 | 117 
51| 10 | 840131) 13 |108 (211, 6 | 1284291 | 10 | 158 | 371 | 15 | 109] 451 | 2 | 106 
52| 5 | 914132] о |118 0212 | 15 | 125 0292) 12 | 1471372) 6 | 130] 452) 2 | 97 
53| 10 | 924133) 10 | 1124 213| 4 | 130] 293) 15 | 145 |373) 5 | 148 | 453 | 15 | 92 
54| 3 | 101] 134) 4 | 122 |214) 13 | 126] 204) 5 | 1484374) 14 | 156] 454 |. 8 | 100 
55| 2 |19]|135| 19 |113 0215) 4 |125 |295) 6 | 1459375) 14 | 164) 455| 2 |in 
56| 1 |14 [136| 3 | 110] 216| 7 | 123 | 296 | 15 | 134|376 | 11 | 1890] 456 | 4 | 120 
57| 14 | 166 | 137 |. 4 | 100] 217| 13 | 119] 297| 6 | 137 |377) 8 | 187] 457| 11 | 121 
58| 18 | 190} 1333 | 7 | 103] 218) 4 | 111] 298) 13 | 136] 378 | 15 | 174] 458 | 15 | 119 
59| s |212|139| 0 | 106] 219| 13 | 101 |299 | 2 | 136] 379| 18 |151 |459) 8 | 110 
60 | 14 | 211] 130 | 16 | 107] 220) © | 91 [300 ) 14 | 129] 380 | 7 |127|460 | 3 | 98 
61) 15 |204 |141) 13 | 1020221) 3 | 82]301| 9 |128 |381) 1 | 99] 461) 1 | 98 
62| 17 | 191] 142 | 0 | 1034222) 11 | 76] 302| 4 (1151382) 2 | 88 |462) 4 | 121 
63| 7 |185|143| 2 | 114] 223| 0 | 25 |303) 14 | 100|383 |. 7 | 89] 463 | 13 | 150 
64| 9 |166|144| 4 | 1271224) 10 | 721304) 2 | 93|384| 4 | 119] 464| 17 | 170 
65| 11 | 1600145, 18 | 136] 225) 1 | 86] 305| о | 96] 385) 18 | 143] 465 | 19 | 176 
66 | 14 | 167] 146 | 18 226) 4 | 92|306) в | 109] 386) 5 | 166] 466| 5 | 169 
67| 5 [188 1147) 6 | 131] 227) 6 | 1090307, 12 | 131 |387) 17 | 170] 467| 4 | 149 
68 | 17 |203 |148) о | 121 | 228) 18 | 116] 308 | 10 | 159 | 388 | 12 | 179] 468 | 15 | 136 
69| 18 | 2044149] 4 | 120] 229) 3 | 1390309) 11 | 178 |389 7 | 179] 469| 8 | 137 
70| 13 |205|150| 11 | 1371230) 7 | 149] 310| 17 | 187] 390| 14 | 184] 470 | 6 | 136 
71 | 18 |185 | 151 | 15 | 162] 231 | 12 | 1497311 | 10 |200|391 | 15 | 190] 471 | 14 | 133 
72| 0 | 1719 152) 15 | 179} 232) 15 | 141]|312| 12 | 212]392| 11 | 194] 472 | 9 | 126 
73 | 19 | 149 153 | 11 | 188] 233 | 11 | 1370313) 17 | 216] 393) 13 | 201 |473) 0 |125 
74| 1 | 146]154) 9 | 1840234) 1 | 134) 314] 17 | 211] 394) 18 | 199] 474 | 15 | 109 
75 | 14 | 130]155| 12 | 183 |235) 8 |128 |315, 16 | 192|395| 6 |193|475 | 7 | 103 
76| 12 135 |156) 18 | 175 | 236 9 | 130] 316 173 | 396 | 19 | 178 1476 5 | 96 
77| 2 |139|157| 7 | 174] 237| 14 | 1324317] 9 | 151] 397) 13 |173 |477) 1 | 95 
78| 7 |1600 158) 15 | 160] 238| 9 | 128 1318) 2 | 152] 398| 1 | 178] 478| 11 
79| 16 | 175] 159 | 5 | 157] 239) 6 | 1221319) 16 | 160|399 | 13 |178 |479) 5 
80) 15 | 188] 160) 11 | 141] 240) 7 | 108 | 320 10 185 |400) 18 | 1831480) 5 
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The mean value of our series is 159-5 = 142-5. The number of upcrosses will 
be found from the table to be 23, the first between the 19th and 20th term of the smoothed 
series, the last between the 459th and the 460th. The mean distance between upcrosses 
is then 440/22 = 20 units. How does this compare with the mean distance given 
by “normal” theory? 

The weights of the graduation are [1, 2, 3, 3, 3, 2, 1] and from (46.54) we have 

cos 0 = $4 = 0-9189 
0 = 23° 14. 
Hence the mean distance = 360/23-233 = 15:5 units. The observed mean distance 
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Fig. 46.1—Graph of the last 117 terms of the series S of Table 46.2 


is 20-0 units, but this is based on rectangular variation, and we are, perhaps, entitled 
to expect some difference from normal theory. For rectangular random variables, 
values distant from the mean occur more frequently, and it is not surprising to find 
oscillations in the series which do not result in upcrosses. 

'The number of peaks in the series will be found to be 62, the first at the seventh 
term, the last at the 466th. Hence the mean distance between peaks is 459/61 = 75 
units. From formula (46.59) we find 


cos 0, = $ 0, = 48° 11’. 


Thus the theoretical mean distance is 360/48-187 = 7-5 units, in good agreement with 
experiment. It will be observed that several of the distances between peaks are due 
to very small ripples. 
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46.22 Let us now examine how the variance of the induced oscillation compares 
with the variance of the original random series. 

The sum of А random elements with variance v has variance Ао and its mean has 
variance v/k. It does not follow that a simple moving average has a variance 1/k 
times that of the random element, because of correlations between successive members 
in the derived series. If the original series was ¢,,..., £,, the derived series is, with 
weights а,..., аһ 

Daze; = т, sa 
E ауы = ү say (46.62) 
У аур = yea 
The expected value of the sum of these values is zero since the expected value of ғ 
may be taken to be so. Since there аге n—k+1 terms we have for the variance 
1 


2 
T 7" (46.63) 
The expected value of this, since the e’s are independent, is 
1 S = 2) = S 2 
aky Ee = = Eq) = 0 7 at. (46.64) 


In particular, if the а? are all equal to 1/k, the expected value of the variance is v/R. 
This gives us the average reduction in the variance. 

If a simple average of extent Ё is iterated q times the weights are the successive 
coefficients in 


1 Ў m. I/14)! 
pll tetat... жа =т= 5 
The sum of squares of these coefficients is the coefficient of хї®—1) in 


1 (1—x*)™ 

k*" (1—x)2” 

and this gives the average reduced variance for a simple average of А iterated q times. 
The following are the values of the reducing factor for some of the values of k and q: 


(46.65) 


4 

| 1 2 3 4 5 

| 3 | 033 023 049 017 015 
| 4| 025 047 044 042 011 
k| 5 | 020 014 011 040 0409 
6 | 0417 041 009 008 007 

7 | 014 10 008 007 0-06 


Evidently the result of the first moving average is to generate a series with a much 
lower variance than that of the original random element, but the second and succeeding 
iterations do not reduce the variance further to the same extent. In the case k = 7 
the first averaging reduces the variance to one-seventh, but the next three reduce it 
only by a further half. 
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46.23 It is also instructive to consider the effect of the moving average on serial 
correlations in residuals. For the series (46.51) generated by a moving average on 
a random series we have, as at (46.54), 


COV (иьщ) = EE агу. Хае) 
k-s 


=V È айры (46.66) 
j=1 
and thus for the sth serial correlation of the resultant series 
kes 
ate PET 
n= iire 
Ў а} 7 (46.67) 
ј=1 
= 0, || >А. 


Thus, for an infinite series generated in this way we see that, whereas the original 
(random) series had zero serial correlations, the induced series is serially correlated 
up to order k, i.e. as long as terms in the generated series have any terms of the original 
series in common. 

For example with a simple moving average of extent Ё, all the a's are equal to 1/k, 
and from (46.67) we easily find 


A 1-41, (46.68) 


so that the correlation may be quite high for s = 1 and falls off linearly, as s increases, 
to zero at s = k. High correlations of this kind between neighbouring values are 
responsible for the Slutzky-Yule effect. 


Example 46.7 
The weights of the Spencer 21-point formula are 
sis[-1, —3, —5, —5, —2, 6, 18, 33, 47, 57, 60]. 
Apart from the divisor 350, which may be disregarded for present purposes, the sum 


of squares of weights is 17,542. The products (46.66) and the corresponding serial 
correlations are as follows: 


Zajajek | TE 


k | Xajaje TE k 

0 17,542 1-000 11 —930 | —0-053 
1 16,786 | 0-957 12 —528 —0-030 
2 14,667 0-836 13 —214 —0:012 
3 11,584 0-660 14 -27 —0-002 
4 8,085 0-461 15 50 | 0-003 
5 4,726 0-269 16 | 59 | 0-003 
6 | 1951 | 0-111 17 40 0-002 
7 6. | 0-000 18 19 0-001 
8 —1,074 | —0061 19 6 0-000 
9 —1,430 | —0082 20 Y | 0-000 

10 —1,298 | —0-074 21 || 0 | 0:000 

| | 
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The correlogram is shown in Fig. 46.2. From k = 13 onwards the correlations are 
very small, and from & = 21 onwards they vanish completely. 


Fig. 46.2—Correlogram of series generated by the Spencer 21-point formula 
(Exampie 46.7) 


The variate-difference method 

46.24 The concept of a series which consists of a polynomial element plus a 
residual of a more or less random kind has given rise to a method which purports to 
eliminate the former by differencing. Clearly, successive differencing will eventually 
entirely eliminate any element which is actually a polynomial in the time, and may be 
relied upon almost to eliminate any systematic element except, perhaps, exponential 
or cyclical terms. Let us consider the effect of differencing upon a random series 


е. We have 
Are, = (1) (9) Etir-a— +. (1), 
= (U-1ye,. (46.69) 
Taking, without loss of generality, e, to have zero mean, we have 
E(Are) = 0, (46.70) 


and if ғ; has the same variance v for all 2, 


L4 r 2 
var (Are) =v È () 
охсоеЁ. х" іп (1+х)у'(х+1)/ 


Ee (46.71) 


r 


Ш 
з 
L 
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We may then derive an estimate of v by writing 


o = (Are), (46.72) 


It is to be noticed that we use the second moment about zero, not the observed variance 
of Are, since the mean is known to be zero. This shortens the arithmetic to some 
extent. 


The factor E) for r = 1 to 10 has the following values: 


Ж DET. 


1 2 0-5 

2 6 0-166,667 

3 20 0-05 

4 70 0-014,285,7 

5 252 0-073,968,25 

6 924 0-021,082,25 

7 3,432 0:0°,291,375 

8 12,870 0-0*77,700,1 

9 48,620 0-0*20,567,7 
10 184,756 0-055,412,54 


46.25 Basing itself on equation (46.72), the method of variate differences proceeds 
as follows. We difference the series once, find the second moment about zero of 
the resultant, and divide by 2; we then difference again and find the second moment 
about zero, dividing in this case by 6; andsoon. If the successive estimates of v decrease 
we continue with the differencing. There will, in general, come a point when they 
cease decreasing and remain constant within sampling limits (which may be rather 
wide). At this stage we may suppose that we have eliminated the systematic element 
in the original series. The final estimate gives us an estimate of the variance of the 
random element in the original series, and the order of the difference to which we have 
had to go will give an indication of the degree of the polynomial representing the 
systematic component. 


Example 46.8 

Let us apply the variate-difference technique to the series of Table 46.1. We 
know from the method of constructing the series that the systematic part ought to be 
completely eliminated after the third differencing, and also that the random part con- 
sists of an element with variance 833 approximately. In fact, the random numbers 
from 1 to N have a variance (N ?— 1)/12, and N in this case is 100. The actual variance 
of the random element in Table 46.1 is 843. 
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Table 46.3 shows the series and the differences up to A*. For the sums of squares 
in the various columns 5; corresponding to A’, we find: 


S,= 107,541 
S,- 318115 
S,- 1,033,513 
Sı = 3,445,308 
S, = 11,720,069 
S, = 40,548,844 


To obtain second moments we divide by 51—j and then, to obtain the estimate of 
by ( We find the following: 


ӯ. Estimate 


1075-41 
1082-02 
1076-58 
1047-21 
1011-05 

975-20 


Queens 


Curiously enough, the estimate for j = 2 is higher than that for j = 1, and there 
is little difference between the various estimates. In the ordinary way we should have 
concluded that the systematic component was adequately represented by a polynomial 
of order 1, that is to say a straight line, and that the residual random element had a 
variance of about 1000. 

The reader must not be surprised to find discrepancies of this kind between theory 
and experiment in short series; and the discrepancy is not, in fact, as big as it seems. 
The variance of the original series is 6272-61. The mean square of the first difference, 
divided by 2, is 1075-41, so that about five-sixths of the variance has been eliminated 
by the first differencing; and the method indicates, quite correctly, that the greater 
part of the systematic element is linear. The random element is rather large com- 
pared with the non-linear systematic terms, and the latter have got caught up in it— 
the series is too short for the variate-difference method to disentangle them. Con- 
sider, for instance, the cubic term (£—26)3/100. In the original series this varies 
in value from —156-25 to +156:25. First differences reduce it to 3(2— 26)2/100, 
varying from 18-75 through zero to 18-75, whereas the random element is increased 
in range from 0 to 198. Already the systematic term is being swamped by the random 
element, and a slight degree of accidental correlation between the two can easily account 
for the increase in the mean square of second differences. 

The matter may be put in a slightly different way. Suppose that, relying on the 
variate-difference method, we regarded the data as represented by a linear equation 
plus a random residual. If we fitted a straight line by least squares and examined 
the residuals, we should probably find very little evidence of departure from random- 
ness. This representation would differ from the mode of construction of the series, 
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but it would be a possible method of construction. Only the failure of the representa- 
tion to conform to further terms of the series would reveal its weakness. 


46.26 The variate-difference method thus provides a kind of lower limit to the 
degree of the polynomial which will represent a series locally or generally. There 
remains for consideration the question as to what sort of differences between successive 
estimates of v can be regarded as chance effects, in order to decide when the value has 
reached a stationary level. The sum of squares S; is a constant factor times the second 
moment, but as its members are correlated among themselves we cannot use the variance 
of the second moment to test its significance. Further, S; and S;,; are correlated. 
We proceed to derive the sampling variance of their difference, the somewhat compli- 
cated formulae being due to O. Anderson (1914). 


4627 Write 
b= ( (46.73) 


E(Nuy _ (E 5и, _ 


(2) 3M ша ды 
r 
where и; is the variance of и. Further, 
E(Aru)* = E[(bou, ab u, + bat, а... +(—1)°b, uy}? 
+ {Bo tt, a b tyr + bot,— LL. +(—1)" bua}? 
E 


Then we have 


(46.74) 


+ Фа bos a+ atin a sss (Лу). (46.75) 
Consider first of all the terms in this which result in fourth powers of u. They 
will derive from 
Е и + иў+... +02 ue 
thura bunac ... + щш 
Б ЖУЛА 


ipia... july. (46.76) 
Writing now 
В = (b)+ (3+6) + ... DIR LL? (46.77) 
Ad = (Rp ...452)? = ey (46.78) 
we find that the term in E(u‘) is 
(A2 (n— 7) - 2B2) E(u*). (46.79) 


The only other term appearing from (46.75) will be of type E(u? u2), 1 = m. If the reader 
will write out the expansion of (46.75) he will find that the coefficients are expressible 


in terms of 
== 


AY = (++... +636) = ( e ) (46.80) 


and 
В? = (b bj)? (bob; b, b; +1)? + (090; b bie ssh abu) (46,81) 
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‘The expression for E(A’u)* reduces to 
(n—2r) A? E(u*) +4 ((n —2r + 1)A} + (n—2r+2)A3+ 
+A} (n—2r +1} Blut us) +23 E(u) 
+8{Bi+ B3+ ... + В2) E(u? uz). (46.82) 
Substituting д, for E(u‘) and u$ for E(uju;,), dividing by (n—7)* ў and subtracting 


#3, we find the sampling variance of the estimate of v. The expression can, however, 
be simplified to some extent. Putting 


n= EQ (i) 220) Gia LN) 
„00: өө 


we find, after lengthy algebraic rearrangement, 
S, ma 38 [,_ 27, 


(n—-r) E E TER (n— XT) 
(C) omm rem ee 


If terms of order (n—7)-? can be neglected, this reduces to 


var 


cid, (1) за / (2) зз) 
п-т 2r] n-r/ Nr 
or, using the Stirling approximation to factorials, 
1 
ac; Un Spi ia У(27л)), (46.86) 


which is a fair approximation to (46.85), being within 3 per cent for r as low as 6. 
When the population of values of и is normal, u,—3y$ vanishes and the formula 
simplifies accordingly. 


46.28 In a similar way it may be shown that 


5, EN —— Ж 2T, 
ppc FRY ег 2г+2 s ЕЗ8 1— +2 TIEN 
r r+1 r]\r+1 
ey 
+218 2r 2n—2r-1_— r41 (46.87) 
aT 


fo (2r+2\ a—r-1 _ 2(з—т—1)[” 

т /\т+1 

3 r-l rM r+1 2 r—2 r 2 r+1 2 rM r+1 2 
Е (аи (E d CAE bos , 
А E) (9) +23 () (з) ч +) (3) 


where 
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We can now determine the variance of the difference of 
S, Sia 


lE Збан T Lecce 
2г 2r4-2 

«-»(7) er) 
The general formula is complicated, but for normal variation, large л and r 2 6 we have, 
analogously to (46.86), 
(ADV) ( S$ y 
Agr i-r-D6 (”) . (46.88) 

r 


var (difference) = 


The arithmetic application of the formulae has been facilitated by the preparation of 
tables of the constants involved. Reference may be made to Tintner (1940) who gives 
tables prepared by himself, О, Anderson and Zaycoff. We shall consider below some 
further modifications which simplify the formulae to a certain extent. 


Example 46.9 
For the data of Table 45.4 (sheep population) an application of the variate-difference 
method up to the tenth difference gave the following results: 


r s/f 


© оосо UN=e 
B 


- 


‘The values here are falling steadily from r = 1 to r = 10, but very slightly towards 
the end. From (46.88) for r = 6 we have for the variance of the difference, 80-7 
approximately, and for r — 10, 25-8 approximately. It appears that the reduction in 
variance at r = 8 is losing significance. It does not, of course, follow that the trend- 
line must be of this degree, for we may not want to eliminate the oscillatory movements 
in the trend-line. In fact, we should not leave much behind if we eliminated trend 
by a cubic. 


46.29 The variate-difference method will clearly not eliminate systematic effects 
such as periodic terms with very short period. Consider, for instance, the series 
1, —1, 1, —1, etc. The first differences give us a series 2, —2, 2, —2, etc., second 
differences 4, —4, 4, —4, etc., and so оп. The variance of the series of rth differences 
is, neglecting effects due to the shortness of the series, 2%” times that of the original, 
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and the quotient when this is divided by (7) tends to 
27 (rl)? 
"eroe 


and so increases without limit. In such a case we cannot obtain an estimate of the 
variance of any random element which may be present. 


The problem of testing differences between Sr and S;+1, or the equivalent problem 
of testing whether the ratio S;/S;+, is near unity, is complicated by correlations between 
the differences which compose these quantities. Tintner (1940) and Johnson (1948) 
have suggested methods of overcoming the difficulty, but they involve sacrificing a large 
proportion of the data. 


46.30 There is an intimate connexion between the variance of the differences of 
a series and its serial correlations. We have for a series of n terms 


n-1 n-1 n-1 
m (Au)? = 2 (uu)? = 2 {(щи—й@)—(и,—й))* 
= n— n-1 

=F ua i-2 2 (шаш ® (ш) 
1 1 1 


Approximately, then, on division by n—1, 
var Au, = var u—2 cov (и, uj) Var u 
= 2 var u(1— 7i). (46.89) 
To the same degree of approximation, 
var Au; = E (Uppa tgr +m)” 
= var u(6— 8r, +27). (46.90) 
Likewise, we have in general 


var AP u, = var { 3 (Poem) 
ve (OROG) 
eG) 
= varu {(7?)-2n (ala "S } 


2) ( 2p 2p(p—1) } 
= varu4l——— n ,—- Aa pe 46.91 
G pel eren * ies 
We can similarly express the serial correlations in terms of variances of the differences. 


Put 
_ var Ди 


ОЁ 
j 
Then it may be shown (cf. Exercise 46.14) that 


Vy = Р-р" rey AN Dc Vat... (46.93) 


V = var u. (46.92) 


сс 
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46.31 The relative simplicity of these formulae is due to the fact that we have 
dealt with a long series, i.e. have neglected certain end-effects, writing, for example, 


We can preserve the formulae as exact results if we are prepared to make some small 
modifications to the definitions so as to incorporate these end-effects from the outset. 
Define a series of summation functions Eg) by the formulae 


Eo (5; Yi) = MY aya. Ys Ynca Yn Yn (46.94) 
Eogy(xy) = PYH ayat -o + Xp дуаа Bs (46.95) 
Eoy(sy) = Bey хауа Meat «++ ха ana Bernt Han (46.96) 
The general law of formation obeys the recurrence rule 
n a-l 
Eo uy) =# Хто (Jit tiai) (46.97) 


so that, for example, the first three terms in X, have coefficients 4, $, 4. 
Now define the modified quantities 


mVo = Xo u*/n (46.98) 
mV, = Emn (A)3/2n (46.99) 
„Vya Eup (At / (Фу. (46.100) 
Likewise define 
mp = Lemp) Mi ip / Zim i - (46.101) 


Then these quantities obey exactly the relations (46.91) and (46.93). The simplest 
way of seeing this, perhaps, is to consider the series 
0,:0;.0; ess E OT LR D510) cars (46.102) 
The first differences are 
0, 0, ty, и,—и„...,и„—и„_„ ~Un 0, 0... (46.103) 
and their sum of squares is 


=1 n pod 
uic E (uju)? +u = 2 Ў ©@-2® щща = 2,0,(1— п). (46.104) 
$21 j=1 j=1 
In fact, for such a series, (46.97) is equivalent to 
E щик = F Bin-n (шшш шала) 


= (Хуш) (т) 
and the argument by which we arrived at (46.91) and (46.93) holds exactly for the 
infinite series, hence for the infinite series (46.102) and hence for our modified V's 
and r's. 


46.32 As we have approached the matter, the variate-difference method has 
been used primarily to examine the order of the polynomial of best fit, a point being 
reached when the quantities V do not seriously change for higher differences. But 
on the assumption that the original series consisted of a polynomial plus a random 
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error we can also enquire, given a set of Vs, what is the best estimate of the error 
variance. The question has been examined by Quenouille (1953b), who seeks a linear 
function of the V’s which has minimal variance. In most practical cases it is more 
realistic to consider the possibility of serial correlation among the errors. Given 
a series consisting of a polynomial plus a serially correlated error, the problem 
is then to extend the variate-difference method so as to estimate the serial correlations. 
This question was also considered by Quenouille. Cf. Exercises 46.9-11. 


46.33 But, however we approach the subject—by fitting polynomials directly, 
by moving averages, or by any other smoothing process—we encounter the difficulties 
mentioned in 46.18. The trend elimination will distort the residuals. There seems 
no escape from this situation. We can only hope to make the best of it, and this we 
can do in two ways: by choosing methods which, other things being equal, minimize 
the distortion; and by arranging our procedure so that, if we have misgivings at any 
stage of the later analysis, we can disentangle the distortion due to smoothing from 
other elements in the residual series. We proceed to examine the possibilities of the 
second line of attack. 


46.34 Let us suppose that we divide our series of n terms into consecutive sets 
of s, and fit a polynomial of the same type to each set. Within any one set we may 
get a satisfactory trend-line. But clearly the line for any set must join on to the line 
for the next, and, in some acceptable sense, smoothly so. Subject to this matter, 
which we shall examine in a moment, such a method has the advantage that it treats 
the series as a set of independent blocks, and we can apply an analysis of variance to 
them. The method may be regarded as a compromise between the moving average 
and fitting a polynomial to the whole series. It was considered at length by Rhodes 
(1921) and has been extended by Quenouille (19492). 

As a simple example of the method, consider the fitting of straight lines to sets 
of three points. If the fitted value at и, is 25;, and из is 25, the value at из must be 
b,+6,. If, further, the fitted value to us is 2b, (that at из being already determined as 
2b,), the value at u, is b,+3, and so on, the values being 

actual Uy и, Us us Us Ug Cr AES 

fitted 2b, b,kb, 2b, b,t*b, 2b, +0, 2b, ... (46.105) 
'The trend-line so determined will be continuous, but its first derivative will be dis- 
continuous at из, Us, и, etc., i.e. it consists of a series of straight lines of extent three. 


46.35 The actual values of the constants 5 may be determined in the usual way 

by least squares, i.e. we may minimize 
(u,—2b,)? + (u4— (b, +by)}*+(ug—2b,)?+ etc., (46.106) 

giving the set of equations 
2u,+u,—5b,—b, = 0 
Ug+2uy + Uy —b, —6b,—b, = 0 (46.107) 
etc. 

'The equations are not difficult to solve, but once again they can be simplified very 
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much by a suitable modification of end-effects. Let us, as usual, consider an odd 
number of terms tı, ... иу and modify our series to 
(Urt uams), Uas Uss ...› E CERE] (46.108) 
with fitted constants 
b bu, 2b, +6, ..., 25, bit bm (46.109) 
This is analogous to the procedure of 45.34, in which we take the last term as equal 
to the first and hence render the series "circular." Writing wj for {(иү+ иһ), 
we have to minimize the sum 
(ui — (bit „))*+(и,—2Ь,)%+... + (lam 7 28)? 
leading to the equations 


6b,+ b, + Dy = иү+2и,+и, = Uz, say 
b,+6b,+ ba = из+2и;+иу = О, say 

b,+6b,+6 = Us+2u,tu; = О, say (46.110) 
b, Tb, + 6b, = uy i 2u, up = Um say. 


The advantage of this form is that the coefficients of the b’s form a symmetrical circulant 
matrix which can be solved once and for all. (For the method see Quenouille, 19492, 
and Good, 1950—cf. Exercise 46.13.) The result is to express the b’s in terms of the 
linear functions of the U's in (46.110). 

Exercise 46.12 generalizes the results above. 


Example 46.10 
We revert to the data of Table 46.1 with values of и, as given in column (4), 
except that J(u; + иу) has been substituted for ш, and uj. The values are repeated in 
Table 46.4. The functions О, are shown in column (3) and the corresponding values 
of b in column (4). Thus, for example, 
U, = 7654-2(—90)--(—17) = —120:5 
Uso = 221--2(270) -- 76:5 = 837-5. 


Also 
66, --b,-- b, = —242-2086 — 5:6002 + 127-3086 
= —120-5002 = U» 
and so on. The fitted values are immediately obtainable, e.g. for t = 2, и = —90, 
2b, = —8074; for t= 3, из = —17, +0, = —4597. For t=1, ш = 7655, 
b,--b,, = 86:94. 


As usual in Least Squares fitting, we do not need to work out each residual in order 
to calculate the sum of squares, for (cf. (35.1)) the Residual SS is 
Eu-XbU,. 
In our present example 
У u? 47445825 
IE b; U; = 448,274-26 
Difference = 26,184. 
We have fitted 25 constants to 51 observations, one of which was adjusted, so there are 
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Table 46.4—Series of Table 46.1 fitted by straight lines to sets of threes 


t ш Ur bi t ue ш bi 
| SP] | | 
1 76-5 | | DT eet | 
2 —90 | -1205 | —40-3681 28 37 1601 | 176740 
3 -17 29 12 | 
4 -32 -92 — 5:6002 30 96 274 39-4348 
5 -11 315 y 
6 -59 | -97 — 18-0309 al Гога 182 19-7171 
7 32 432 61.524] | 
8 | 28 110 | 167855 34 64 | 214 | 242620 
9 22 | 35 | 34 
10 62 196 | 273178 36 126 344 48:7106 
11 50 | 37 58 
12 cre 82 133 | 15-3071 38 | 57 267 | 274740 
13 79 39 95 
14 | -7 | 139 13-8390 | 40 75 | 403 53-4452 
15 7 | 41 158 | 
16 | 85 | 259 40-6586 42 99 454 54-8549 
17 | 15 | 43 | 9 
18 -4 68 12096 | 44 | 159 5722 | 714253 
19^ a | 45 | 156 | | 
20 39 | 10 | 20080 | 46 | 180 | 717 88-5930 
21 dedi 47 | 201 | 
22 51 156 | 182862 | 48 239 | 900 114-0164 
23 53 | 49 | 221 | | 
24 | 48 | 191 26-1987 50 | 270 8375 | 127-3086 
25 4 | St | 76-5 | 
26 | О 347 | 155012 Еа: al 
| | Тотиз| - | 65450 818-1244 


25 degrees of freedom in the estimate of residual variance. The estimator is then 
26,184/25 = 1047 against a value obtained (from first differences) in Example 46.9 
of 1075. 

But we can do better than this. ‘The method of fitting lines to three points suggests 
that there may be correlation between observed residuals in neighbouring points of a 
set of three, but not between sets. 

We can, in fact, regard the series as }(n—1) blocks of two, the two differences 
in a fitted triad having values typified by },—b,, ba—b,. The sum of squares within 
blocks is estimated by ;';(Zu;—2u;,,)? which is found to be 406-12. Thus we can 
analyse the total sum of squares as 


d.fr. 55 Mean square 
Fitted constants 25 448,274-26 
Within blocks 1 406-12 
Residual 24 25,777:87 1074 


50 474,458-25 


The residual mean square is now in almost exact agreement with the value obtained 
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in Example 46.9 from first differences. In fact, the agreement is much closer than 
we have any right to expect. 
For a more extended account of this topic, reference should be made to Quenouille 


(1949a) who has subsequently considered the problem of fitting so that the pieces of trend- 
line join smoothly. 


Seasonal variation 

46.36 In ап analysis of seasonal variation we are presented with one major advantage 
and one minor disadvantage. "The advantage is that we know the period of the seasonal 
recurrence to be one year. The disadvantage is that our observations are usually at 
an even number of points, quarters, months, or weeks. Most of what we have to say 
about seasonal movements over a year can be applied without change to movements 
over other periods which are strictly cyclical, e.g. temperature movements over a day 
or price movements over a week. For simplicity, however, we shall confine ourselves 
to seasonality in the strict sense of the word. 


46.37 Seasonal movements are often sufficiently marked to need no demonstration. 
Cases sometimes occur, however, where we are not certain whether the movements 
in a series are due to random effects imposed on a trend or to a fluctuation of non- 
cyclical character, and in the first instance we require a test for the existence of season- 
ality. In any case we require a measure of the seasonal effects. 

The quarterly data of Table 46.5 (index-numbers of wholesale prices of vegetable 
food) will illustrate the possibilities. In Table 46.6 we have simplified the data by 
taking a new origin and scale. 


Table 46.5—Quarterly index numbers of the wholesale price of vegetable food 
in the United Kingdom, 1951-8 


(Data from the Journal of the Royal Statistical Society for 
appropriate years. 1867-1877 — 100) 


1951 1952 1953 | 1954 1955 1956 | 1957 | 1958 


First quarter 2950 | 347 | 3729 | 3540 | 3337 | 3232 


304-3 | 3125 


2nd » | 317.5 | 323-7 | 380.9 | 345-7 | 3239 | 342.9 | 285-9 | 3361 
3rd » 314-9 | 3225 353-0 | 319-5 3128 | 3003 2923 | 295-5 
4th » 3214 | 332.9 | 348.9 | 3176 | 310-2 | 309-8 | 2987 | 3184 


Table 46.6—Data of Table 46.5 with origin 300, values multiplied by 10 


1951 1952 1953 1954 1955 | 1956 | 1957 1958 


First quarter | -so | 247 | 729 | ss | 337 | 232 | 43 125 
2nd | 


» 175 237 | 809 457 239 429 |-141 361 
3rd a 149 225 530 | 195 | 128 | 3 | -77 | —45 


4th E 214 329 489 176 102 98 =13 184 


TIME-SERIES: TREND AND SEASONALITY 397 


46.38 Consider first of all the possibilities of distribution-free tests. It is tempt- 
ing to rank the quarters within any one year from 1 to 4 and consider how the ranks vary 
from year to year. А little reflection will show, however, that such a procedure does 
not disentangle seasonal movement from trend. If the data were uniformly increasing 
in time the first quarter would always rank the lowest; but this is not a seasonal effect. 
In fact, to make any progress it appears necessary to make some attempt to eliminate 
trend as a first step. 

The kind of model that we have used so far in decomposing a time-series is of the 
additive type, but there are obvious reasons for regarding seasonal effects as multi- 
plicative. ‘Thus, if the series consists of a yearly value, say y, a seasonal value (constant 
from year to year in proportional effect), say s, and a random error, the model is 

Ug = уи t=1,...m; g=1, 2, 3, 4. (46.111) 

If the trend is slow, so that the seasonal effect may be regarded as constant from year 
to year in absolute (not proportional) magnitude, we can write approximately 

ug = yr S8 (46.112) 

which is an ordinary analysis of variance model with a two-way cross-classification. 

This is apt to be an indifferent approximation if trend is at all appreciable, and it would 

be safer to work with 

log tq = log y,-- log 5+7. (46.113) 

One simple way of applying (46.111) is to divide и, by the average of y, over the 
year concerned and to regard the quotients as estimates of the seasonal multiplicative 
factor. If we were to adopt this for Table 46.5, we should proceed as follows. 

For 1951 the average of the four quarterly values is 31272. Dividing this into the 
quarterly values, we get 94-49, 10170, 10086, 10295. Similar calculations for the 
other years yield: 


| 1952 


1953 1954 | 1955 | 1956 | 1957 1958 | Mean 


| 1951 | 
| First quarter | 94-49 | 99-62 102-47 10592 104-23 101-30 103405 99-01 101-3 
2nd „ | 101-70 99-31 104-66 103-44 101-17 107-48 96-82 106-49 102-6 
йа „ | 100-86. 98-94 | 97-00. 95:60) 97.70 9412 9899 93-62 97-1 


102-95 | 102-13 | 9587 95:03 96-89 9710 10115 100-88 | 99-0 


It seems reasonable to take an average of the individual quarters over years to estimate 
the seasonal component (assumed constant). This is done in the last column. If 
we wished to “ correct” the original figures for seasonality we should divide each 
first quarter’s figure by 101-3, each second quarter’s figure by 102-6, and so on. 


46.39 A second possibility is to use a moving average to eliminate trend before 
examining the residual values for seasonality. We then, of course, run into the danger 
of distorting the residuals. However, if we choose our moving average with care, 
we can minimize this effect so far as concerns seasonal effects. We noted, in fact, 
in 46.16 that if the simple moving average (with equal weights) is equal in extent to 
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the period of a cyclical component, the trend-value of that component is zero, so that 
the residual is unimpaired. 

For the data of Table 46.5, this involves the use of a simple moving average of 
fours, which will be adequate to remove a linear trend. Our original data, however, are 
the averages for a quarter’s prices and relate, then, to time periods of three months 
centred on February 15, May 15, August 15, November 15 (or thereabouts). The 
average of an (even) set of four will give us a trend-value at some point half-way between 
these dates. To bring the time-point of the average back to comparability with the 
originals we must “ centre” the average. This is most simply carried out by taking 
the mean of consecutive pairs of the four-point average. Thus, in Table 46.6 the mean 
of the first four values is 122, and of the second to the fifth values is 196-25. "The mean 
of these two, 159-125, is taken as the trend-value corresponding to the third quarter 
of 1951, namely where the original value of the series is 149. The process is clearly 
equivalent to fitting a five-point average with weights 

МІ, 2, 2, 2, 1]. (46.114) 

Proceeding in this manner on the data of Table 46.6, we find the residuals shown 
in Table 46.7. The deviations of the means (each based on seven quarters) from the 
overall mean of 24-05/4 are 62-45, 86-17, —88-39, —60-26. In terms of the original 
variables the corresponding values would be 306-25, 308-62, 291-16, 293-97 or, on 
the basis of a mean of 100, 102-1, 102-9, 97-1, 98-0. These are substantially different 
from the results of the method of 46.38. 


46.40 It is also of interest to see what happens if we eliminate trend by a more 
elaborate form of moving average, and we will consider the fitting of a cubic to seven 
points with weights J;[—2, 3, 6, 7]. The residuals are shown in Table 46.8. The 
seasonal indexes will be found to be 
102-3, 102.3, 973, 98-1 

as compared with 
101-3, 102-6, 97-1, 99-0 (in 46.38) 
102-1, 102-9, 97-1, 98-0 (in 46.39) 

Although the general picture is the same in all cases (a seasonal peak in the second 
quarter, a seasonal trough in the third) there are large enough differences in these results 
to embarrass us in work requiring great accuracy. Our inclination would be to use 
the method of 46.39. That of 46.40 runs into some danger of fitting too well, in the 
sense that the trend-line may embody some part of the seasonal effect. It seems im- 
possible, however, to lay down any completely objective rules for the treatment of 
seasonal effect versus trend. Our general recommendation would be to try several 
methods and to choose the one which appears to give the most reasonable results; 
and, in any published work, to state exactly what has been done. 

Sometimes an iterative process gives good results: a rough trend is taken out, seasonal 
factors estimated, original figures corrected for seasonality, a revised trend estimated, 
and so on. Or again, instead of a simple average of seasonals as in 46.38, a moving 
average can be taken to give a moving seasonal index. 
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46.41 From the point of view of spectrum analysis, which we discuss in Chapter 49, 
there is more to be said about the effect of trend elimination both on random residuals 
and on seasonal components. We shall see that it is possible to correct the power 
spectrum for distortions due to trend fitting, at least in certain cases. 


46.42 There are now in existence some complicated routines written for elec- 
tronic computers which dissect a series into trend, seasonal and residual components. 
They may involve more than one process for the isolation of each component, for 
example by a preliminary smoothing, a first approximation to seasonals, a more refined 
trend fitting, a further approximation to seasonals, and so on. The proof of all these 
puddings is in the eating, and it seems fair to say that for a wide class of economic 
and social statistics such routines work quite well in practice. Reference may be 
made to Shiskin (1955), Eisenpress (1956), Shiskin and Eisenpress (1957), and Burman 
(1965) for some work on this subject. 


EXERCISES 


46.1 A straight line is fitted to 2m+1 points at equidistant unit intervals —m,..., m. 
Show that the line is 
1 3 
mM 
Imt nm 1) 2m 1) 
Hence show that the sum of squares of coefficients of a moving average based on this line for 
the point t = j is 


E tu. 


oles 3j 
2т+1 [^]. 


46.2 Fit a cubic to the last seven points of the sheep series of Table 45.4 and show that 
it gives a trend for the final four values of 1639, 1687, 1750 and 1807. 


46.3 Show that the weights in the Spencer 21-point formula are 


1 
351-5753; 55, 50602801547 57,160] 


and that if it is applied to a random series the variance of the resultant is about one-seventh 
of the original series—about the same reduction as would be given by a simple moving average 
of sevens. 


46.4 Show that Macaulay's 43-point formula 


1 7 
960 [12] [8] [5]* [5 —1, 0, 0, 0, 0, 0, 0, 1 


has weights 
2-7 17, 18, 30, 40, 45, 28, —8, —60, —122, —178, —205, —190, 
—127, —6, 163, 360, 562, 760, 928, 1050, 1127, 1156] 


and that it reduces the variance of a random series about as much as a simple average of nines 


46.5 If æ is a random series, show that the correlation between successive members of 
A* є, for long series is — k/(k-- 1) and hence tends to —1 as k increases. Hence show that the 
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signs of successive terms in A* tend to alternate, where u is the sum of a random element 
and a systematic element representable by a polynomial; and verify by reference to Table 46.1. 


46.6 By eliminating д? from (46.28), show that, for a cubic curve, an accurate trend-line 


is given by 
1 At—1 0-1 
INT (ez m- m) 


and generalize this result. (Cf. Higham, 1882-5) 


46.7 Show that in a long random series of normal elements with variance c? the serial 
correlations are uncorrelated, and that 
var r = 1/n. 
Hence from (46.92) derive the large-sample formula 
m EE: 2p? 2p? (p—1)* 
(p+1)* +2)» T 
(Quenouille, 1953b) 


46.8 Show that in the notation of 46.30 as applied to a random series with variance о? 
and fourth cumulant кү, 


ars 2i+2j) / (2 2) oc 
cov (Vi, V3) x - (i БАРАТ T. ар, вау. 


(Quenouille, 1953b) 


46.9 In the previous exercise, given an estimator of the error variance 


mip 
t= X ай 
i=m+1 
with variance (x, +20) /п, show that t has minimal variance if 
т+р 
У agcd-420, ј = т+1,..., т+р, 
i=m+1 
т+р 
- Ў в=-1. (Quenouille, 1953Ь) 
i=m+1 


46.10 A series consists of a polynomial in t of degree m plus a component which has the 
same variance for all t but the successive values of which may be serially correlated. 
Define 
RE (0632 ду, ie 


Show that for a long series 


m,mt+1,... 


Rey = Vib, - Pr 20-0), 1600-1) (0-2) tard) 
i» = Vol 545" * (53) o4). (p-3)99 (0-5 * 
and hence that if the serial correlations higher than the first order vanish, E(Ryp) = 977. 


46.11 Continuing the previous exercise, show that, defining 
(p-- 2m — 1) (p 2m) 
(2m —1) (2m) TEN 
and if serial correlations higher than the mth vanish, that E(Rmp) = 071m. 
(Quenouille, 1953b) 


Ктр = 
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46.12 In generalization of 46.34 show that if constants а are defined by 
(+++... +27) = agta tat’ ... 
then the sets of weights 
Laisbis1, ®ав-1+1,‚ Eais—vbi+1 
fit a curve of degree d to sets of s points. 
(Quenouille, 1949a) 


46.13 If a circulant matrix A is given by 


А = {asi}, 5] = 0,1,...,п-1, 
where the suffixes аге reduced to modulus л, show that 
n—1 n—1 
|А|= П E а" 
2-0 r=0 
where о is exp (2лї/п), an nth root of —1. Show also that the latent roots д of | A | are given by 
As = Zaro", s=0,1,...,n-1. 
т 
Show also that 
1—1 
a=- E r". 
n,-0 


Hence, noting that the latent roots of A~ are 2—1, show how to invert А. 
(Good, 1950) 


46.14 Starting from the relation 


(71? cos 296 = 1- P eost P G7 DQ easy, 


and putting z = е“, derive equation (46.93). 


ғ 
46.15 Acubic У ау г? is fitted by least squares to а set of values? = —m,...,0,...,m 
j=0 


with n = 2m+1, Show that it is given by 


з 5 2 2 
n(n?=1) (0*4) (Ge Gnt—Eu— (0 -1E¢t 2 
140t 5 
+ n(n®—1) (0$ —4) (1 —9) (3n! — 18n?4-31) E tu — (3*7) E. en) 
15: 
n(n*—1) (4) (79-1) 5и+12 2и) 
140" 


"apo oos C m7 ®ш+20 xiu). 


46.16 А random series has trend “ eliminated " by the removal of a moving average with 


weights [a—m, a-tm-),..., ao]. Show that the serial correlation of residuals is given approxi- 
mately by 
m-k 
У щ@+к—2аь 
i=—m 
теве 
X aj-2aj41 
i=—m 


Compare the values given by applying the formula to the residuals of Table 46.1 (col. (10)) with 
the actual values ғу = —0-411, r = —0-244, r, = 0-231, r, = 0-143, r = 0-007. 


CHAPTER 47 
STATIONARY TIME-SERIES 


47.1 If we remove from a time-series the elements attributable to seasonal variation 
and trend we shall, in general, be left with a series oscillating about some constant 
value, This movement may be so small as to be virtually non-existent—the series 
then consists entirely of seasonality or trend. Or the seasonality or trend may them- 
selves be non-existent, in which case the series is entirely oscillatory. In the present 
chapter we shall study these oscillatory series, supposing that trend and seasonal effects 
have been eliminated or do not exist. Strictly speaking, we ought, perhaps, to treat 
seasonal effects as part of the oscillatory movement and not regard them as eliminated 
beforehand. But we shall see that there are types of oscillation (the rule rather than 
the exception) which are not seasonal in our sense, and it is better to keep them distinct 
as far as possible. 


47.2 Let us begin with some intuitive ideas. In Table 45.1 (barley yields) we 
have an example of a series which fluctuates about a mean value to about the same 
extent, But we might have a series, as in Fig. 47.1(a), in which the extent of oscilla- 
tion systematically increased or, as in Fig. 47.1(b), in which the amount of oscillation 
itself oscillated. We shall exclude such cases from discussion and confine our attention 


(a) (b) 
Fig. 47.1 (see text) 
to series for which the amplitude remains more or less constant. "This does not mean 
that the amplitude of the swings has to be exactly the same, but that there is no 
systematic effect present. 


47.8 To make these ideas precise, consider a set of random variables arranged in 
Order: tj, Us.» stip. Let the distribution function of any set of n consecutive 
WS, SAY Uii Шила» + e s Шато be 

FU piay щаз ++ s Utin). (47.1) 
403 
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Then if F is independent of t for all integral n>0 we shall say that F represents а 
stationary series. The distribution of any set of consecutive variables is the same, 
wherever in the series we choose it. In particular, for m = 1, we see that the distribu- 
tions of all members of the series are identical. 


In the theory of stochastic processes, of which stationary time-series are a particular 
case, it is customary to define stationarity of a less restrictive kind, e.g., a process is 
stationary in the mean if the expected values of all the u’s are the same, and it is stationary 
in the variance if all u's have the same variance. Most of the applications of the general 
stationarity property (47.1) which we shall make concern the constancy of mean and 
variance along the series, but important use will also be made of product-moments of 
the u’s and of the identity of distributions of the u's. 


47.4 In regarding a sequence of variables of unlimited extent as defined by a 
distribution function we arrive at a new problem in the definition of mean values. 
We can, first of all, consider the behaviour of some и, say иу, for different series gener- 
ated by (47.1). Ог, in the second place, we can consider the limit of some set of ws 
(or a function of them), say иу, ts, ...,u, as m tends to infinity. 

In the first class of case we have to consider averages in a population composed of 
different ways in which the series could happen. Each such series is possible, and 
any one which occurs is called a realization of the process. We may have only one 
realization to examine, and, indeed, this is the rule rather than the exception. In 
a sense, then, we have only a sample of one observation from the process. We shall see 
shortly why this does not seriously limit the possibility of inferences about the process. 


47.5 We shall, from now on, assume that the mean and variance of и exist. We 
then have, for all /, 


к= E(u) = (7 шағи) (47.2) 
of = Еш) = |” (иаи). (47.3) 
We also assume that any pair of u’s have an autocovariance 
y, = Е(ищ—н)(шщь;—и)} = Y- (47.4) 
with the corresponding autocorrelation 
ру = 9/0" = р-у. (47.5) 


As noted in 45.33, the totality of coefficients p, (— 1), ру, Pa . . . is called the correlo- 
gram of the series. We may distinguish between the theoretical correlogram, based 
on the autocorrelations, and the observed correlogram, based on the serial correlations 
calculated for any particular series of length n. 


47.6 For any given n, the theoretical correlation matrix of the set tip шц, ... 
1,,5 is the Laurent matrix 
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1 pi Рг +++ Ра-®% 
Pi 1 Pı e Pn-2 Е (47.6) 
Ра-1 Ра-2 Pn-3 +> 1 


Any diagonal running from North-West to South-East has the same elements. 


Example 47.1 

The fact that the Laurent matrix is non-negative definite implies certain consistency 
conditions on the autocorrelation coefficients. The determinant of any minor based 
on the main diagonal cannot be negative. Thus, for example, 


P Е = 1—p720, a trivial result. 
|1 n5 Pa | 
| 1 pm | 12pi(ps—1)— pa 
lpn ps 1 | 
= (1=p,)(1—2pt-+p,) 20. (477) 
Thus, unless p, = 1 (and even in that case) we have 
p2>2pi-1, 


which is by no means a trivial result. 


Example 47.2 
As an example of a scheme which generates an autocorrelated stationary series, 
consider a process defined by 
шад = Puit єр (47.8) 
where ғ is a random variable with zero mean, and values є, €, are uncorrelated for 
р + 9. We then have 
E(u,41) = РЕ(ш) 
and, except perhaps in the trivial case p = 1, it follows that stationarity requires 
Еи) = 0, all t. (47.9) 
It will be seen from (47.8) that и, depends on £p €;-1, £t- etc., but not on ё. Thus 
we have 
Е{и (ига —ри)) = Ё(ив) = 0 
and hence COV (и, шу) = p var u, = ро? 
and the correlation between Up Up = р. (47.10) 
If p, is the kth order autocorrelation we have likewise 
Е{и(щ+к—рш+-4)} = орь РР) =9 
and hence Pe = р (47.11) 
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Example 47.3 

The fact that the Laurent matrix is symmetrical about its main diagonal has one 
somewhat unexpected consequence for schemes generated in the manner of the previous 
example. They could, in fact, equally well have been generated backwards, e.g. by 
a relation of the type 

Uy = pupa tn (47.12) 
where y is another random variable. As before, it will be seen that и, does not depend 
оп 7,., and hence, from (47.12) with 1—1 instead of t, we derive 
E(u(u,.,—puj)) = Elun) = 0, 

giving for the first autocorrelation р, as before. 


47.7 In time-series the transition from parent to sample, and inferences in the 
reverse direction, present some new problems which we shall consider in detail later. 


Table 47.1—Trend-free wheat-price index (European prices) срез ву the late Lord 


" " и и! Т T Bom 
НАЧНИ ЕЕ 
СКЕ CEC CEC ЕЕ ИЕ СИ ЕЈ 


| 
1131611 | 100] 1648 12211685 74]1722| 91 1759| 9111796 95} 1833 
89| 12| 99] 49 134] 86| 75] 23| 94] 60) 88] 97| 84] 34 
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At this point, without prejudice to that discussion, we may present a few sample series 
as illustrations of the type of material encountered in practice. Some of those in 
Chapter 45 are, on the face of it, stationary in character, e.g. Table 45.1 (barley yields) 
and Table 45.2 (rainfall). ‘Table 47.1 is a famous series of trend-free wheat-price index 
numbers compiled by the late Lord Beveridge. It extends over 370 years, a phenomenal 
length of time for economic series. ‘Table 47.2 gives deviations from a simple 11-point 
moving average of marriage rates in England and Wales for the period 1843-1896. 
Table 47.3 is an artificial series obtained by superposing a random term on a simple 
harmonic. "Table 47.4 is another artificial series generated by a more elaborate scheme 
of the type of Example 47.2. 


Table 47.2—Marriage rate in England and Wales: deviation from a simple 
11-уеаг moving average for the years 1843-1896 
Units 1 in 10,000 


Marriage 


Marriage Marriage 
Year mms Year TAS Year | Tus 
1843 - 6 1861 -5 1879 -12 
44 | 1 62 -7 80 -5 
45 | 12 63 1 81 0 
46 10 64 6 82 5 
47 -6 65 | 8 83 i 
48 -8 66 9 84 3 
49 - 6 67 | -2 85 -4 
50 3 68 - 8 86 -8 
51 4 69 —10 87 = 16 
52 7 70 -7 88 -5 
53 11 71 0 89 1 
54 3 72 | 8 90 6 
55 = 8 73 12 91 6 
B5 | —- 74 7 92 2 
57 -3 75 5 93 - 6 
58 -7 76 | 4 94 -5 
59 | 3 77 -3 95 -6 
60 | 4 78 - 6 96 1 


47.8 Suppose now that we have an observed series u;, ... , uj, the primes de- 
noting the fact that this is a single realization. Each и has mean и and variance o°. 
Let us define a time average 


; nis 
Мы) 124 (47.13) 
Mw) = im ^ Ми). (47.14) 


Then we appeal to a theorem of Birkhoff (1931) and Khintchin (1932) which we state 
without proof: 


a) If и, is stationary with finite mean и, M(u") exists for almost all realizations, i.e. with 
ry 7 
probability unity. 
pp 
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(b) If and only if 


lim Bae = (47.15) 
and (a) is satisfied, the average M(u') is equal to the average E(u), viz. 
E(u,) = М(и). (47.16) 


This is a most important result. It implies that in practice we can estimate the mean 
of и from the mean of the successive values of a single realization. If this were not 
so, estimation from single realizations would be practically impossible. 


Table 47.3—Values of the series u = 10 sin (11/5)--er where er is a rectangular 
random variable with range —5 to +5, rounded off to nearest unit 


——]T—— — — Š 
Numberof| Value of | Number of | Value of | Number of | Value of 


term series term | series term series 
1 3 21 11 41 5 
2 8 22 | 13 42 12 
3 6 23 10 43 | 7 
4 2 24 6 44 | 5 
5 = 4 25 -5 45 | 3 
6 -7 26 -8 46 -2 
7 -9 27 -12 47 | =12 
8 -9 28 -10 48 -12 
9 —10 29 -7 4 | —8 
10 -1 30 0 50 -1 
11 8 31 1 51 11 
12 7 32 8 52 13 
13 6 | 13 53 12 
14 4 34 7 54 7 
5 | 53 35 4 55 5 
16 —10 36 -9 56 -1 
17 -11 37 -9 57 - 6 
18 -15 38 -6 58 -14 
19 | -4 39 = 4 59 -8 
20 4 40 -2 60 1 
15 
10 
л 
2 
S5 
л 
ч. 
o 
v 
v 
Sus 
S 
-10 
-15 


Fig. 47.2—Graph of the values of Table 47.3 
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Table 47.4—Values of series uz = 1-14, 1— 0-5u,-2-- €t where є; is a rectangular 
random variable with range —9-5 to 9-5, rounded off to nearest unit 


Number | Value of Number Value of Number Value of 

ofterm | series of term | series of term series 
Te | 7 23 = 4 45 -13 
be «| 6 И Е 46 1 
3 - 6 25 -9 47 6 
4 ay 26 | =i 48 4 
5 3 29» | Creare 49 | 11 
6 -4 28" 4 3 50 15 
7 -5 | 9 51 9 
8 -1 30 | 4 52 8 
б | 10 3 | -8 53 4 
10 | 10 32 - 6 54 -1 
11 6 aget ре 1248 E 4 
12 -4 4 | -2 56 7 
13 -4 35 | 0 57 11 
14 -7 uo ATE SIT 58 0 
15 -2 sp up mars 59 1 
16 6 38 | 3 60 0 
17 17 39 -1 61 -5 
i | 24 40 -8 62 -11 
19 | 17 41 -3 63 -8 
20 4 42 -8 64 -3 
21 1 43 -10 65 5 
22 -5 + -16 


Values of series 


Fig. 47.3—Graph of the values of Table 47.4 
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47.9 If М(и) = E(u) and the second moment E(u— и)? is finite, the process is 
called ergodic. We state, again without proof, an important extension of the Birkhoff- 
Khintchin result: For an ergodic process it is true, for almost all realizations, that the 
autocorrelation is equal to the correlation calculated from a realization 

Mui — Mu) (ui. M(u)). 
Mfu,- Miu) (47.17) 
Here, again, the result enables us to estimate autocorrelations from a single realization. 
The condition (47.15) is not very restrictive, but it is not purely formal either. It will 
be obeyed if the autocorrelations dwindle to zero as the terms to which they relate 
become further apart, but not, for instance, if the series is a harmonic. "There are, 
in fact, further conditions to be satisfied before full ergodicity is attained. 


47.10 The correlogram, as we shall see later, is a useful instrument for exploring 
the nature of the internal structure of a time-series. ‘There is a second function which 
serves a similar purpose and stands in relation to the correlogram in much the same 
relation as the characteristic function to the frequency function. 

In fact, let us define a function 


Wiz) = 22: i Pi asna, zi (47.18) 
If, for some n onwards | р: |< 1, this converges. We also write, subject to existence, 
ае) = A - z 1423 D (47.19) 


In virtue of the relation p; = p-p ke the iex that sin 0 is an odd function of 0, we 
may also write 


w(a)- X p,cos aj (47.20) 
=: Ж. (4721) 


This last form exhibits w(a) as a Fourier transform of the sequence р. Multiplying 
(47.20) by cos ak and integrating term by term, we find 


so) cos kada = X TIN cos aj cos ak dx = лр 


== 6 


апа һепсе 
5l | ое: Ге» aj dW(2). (47.22) 
alo ло 
We may also write 
nets T: oloje da. (47.23) 
ло 


W(x) is called the spectral function. Its derivative тш(ж)) is called the spectral density. 
The graph of w(«) as ordinate against « as abscissa is called the (power) spectrum. 
w(x) has period 2л. From (47.18) we see that W(0) = 0, W(x) = x, W(2z) = 2л. 
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47.11 The power spectrum may also be introduced directly, without reference to 
the correlogram, in the following manner. 

For some given а, let us consider the correlation or covariance of и, and cos «t. If 
there is some rhythm (or pseudo-rhythm—let us not beg any questions) in u, with 
frequency х, the correlation will be high provided that и, and cos at are in phase. If 
the series is at unit intervals and ше about its mean, consider 


а) = 
Қа) = 


È u, cos at 


yes Уф (4724) 

Де” i u,sin at 

We have 
Қа) = a* (a) +b? (a) 


= 2 {(® и, cos at)? -- (Z и, sin at)*} 


һ-1 n-k 
E t ш+2 5 У и, и, (cos at cos a(t+k)+sin at sin «ey 
k=1 t=1 


n-li n-k 
= ae uj+2 E X ии COS 2 


kel t=1 


ї 


л 


where 3®= и?" and rẹ is the correlation-type coefficient Eu,uj,,/Z иё. In the limit 
this becomes 


s? n—1 
= f +22 n, cos 2 (47.25) 
k=l 


I(2) = = 1425 p, cos 2 (47.26) 
c? © 
== E. рь COS ta} 
g 47.27 
= р). (47.27) 


The quantity Z, which we call the intensity, is thus the spectral density multiplied by 
с? /л. 


47.12 Itis customary to graph the spectrum with J as ordinate against « as abscissa. 
In the earlier stages of the development of the subject it was more usual to compute 


22 2nt 
A=2 È uem T, 2 =2л/а, (47.28) 
Bie Sian (47.29) 
n t=1 À 
and calculate 
8% = A+B = Тозо) (47.30) 


in the limit. The graph of S? against 4, the wavelength, was called the periodogram, 
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and this is a terminology we shall preserve, although some authors use “‘periodogram” 
for what we call the “ spectrum." 

It will be noted that whereas in (47.24) we have a divisor in 4/n, in (47.28) and 
(47.29) we have a divisor in n. "The reason for this is that if и contains a harmonic 
element, say cos at, of amplitude c, and the other parts of the series are uncorrelated 
with it, the value of S? at æ is c*. In the spectrum the ordinate would be infinite 
at that point, at least for an infinite series. We shall return to the subject in Chapter 49. 


47.13 The use which is made of the correlogram ог the spectrum in exploring 
the internal structure of a time-series depends to some extent on the purpose of the 
inquiry and prior knowledge of the generating system. Broadly speaking, the correlo- 
gram is more revealing in economics, the spectrum in physics, but there are areas 
where the prudent research worker would use both (e.g. oceanography, meteorology, 
and some biological processes). "The correlogram, as we have remarked, tells us some- 
thing about the relationship between values of the series which are separated in time. 
‘The spectrum exhibits the extent to which the series is in step with certain fundamental 
rhythms; calculating the spectrum is like tuning a radio set, a signal of high power 
being obtained when the trial frequency coincides with an incoming frequency. For 
this reason the peaks of the spectrum, if any, are sometimes identified with harmonic 
terms in the generating system, but this is a procedure which must be carried out 
with some care in interpretation. 


Autocorrelation generating function 
47.14 In the spectral density function 


ща) = X pes (47.13) 
put z=", (47.32) 
Then w(x) = È руз? = С(а), say, (47.33) 


and we thus derive an autocorrelation generating function. We shall also find it useful 
to work with an autocovariance generating function 


С(а) = È ууа) = oet G2). (47.34) 


Moving-average series 
47.15 Consider now a series и, and a moving average defined by 
Op = È ащ. (47.35) 
i=0 
We have here taken the moving average to be of infinite extent, so as to attain 
generality. But we then require to remark that £, is not necessarily ergodic. It will 


be so, however, if Xa? converges, a condition which we assume to be satisfied. 
We then have 


E © 
ЕШ бы) = {$ «шә E жиы} 
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= E оо Е(и usua) 


© 


= У = 
ico METH k 
= 


= 2 tnr (47.36) 


If the autocovariance function of и is C(z), it will be seen that the autocovariance 
function of { is given by 

T(z) = (Z a, 24) (X a 27) C(z). (47.37) 
In particular, if и is a purely random series C(z) = 1, and if we iterate a moving average 
k times, the autocovariance function of the resulting series is 


(E aj 2!) (E ay 272). (47.38) 
Example 47.4 


Consider a moving average of 2, % = %, = $, of a random series. 
T(z) = 114-2) (12-2) = 1(271-24 2). (47.39) 
This gives us, as is otherwise obvious, 
pi = 4, р; = 0, jz9,1. 
The corresponding autocorrelation generating function is derived by standardizing 
(47.39), so that the coefficient of 2° is unity. "Thus G(z) = 1(s-'--2--2). Put now 
z= е“, We find for the spectral density function 
w(x) = 3(e-*"--2-- e?) 
= 1+ соза. (47.40) 
The function is thus a cosine curve with a maximum at x = 0. If we iterate the 
average k times we find 


w(x) oc (1+cos «)*. (47.41) 
The constant term by which (1+ соз о)? is to be multiplied to give w(x) is most easily 
determined by the condition in 47.10 that 


fs du = л. 
0 
In our present case this gives 
_ лїГ(#+1) [1+ соз к\ё 
we) = Nas zo) 


This, for increasing k, tends to infinity at « = 0 and zero elsewhere. All the p, tend to 
unity. The series thus tends to a constant value, as is otherwise evident from the fact 
that successive iterations smooth out fluctuation. 

On the other hand, if we take successive differences of the original random series, 
® = 4, % = —4, and we find after k differences 


Ve: ( тз» jy. (47.42) 


2 


This tends to infinity at ж = л. The even order autocorrelations tend to +1, the odd 
order to —1. The series thus tends to terms which are equal in absolute value but 
alternate in sign. 
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Example 47.5 
The moving average with weights 
a[-1, 2, 4, 2, - 1] 

is repeatedly applied to a random series. The autocovariance generating function 
after k iterations is then 

6-*{— 272271 4-22 — 23}. 
Putting z = e^ we find 

ш(«) = e(34-2 cos «+2 cos? о) (47.43) 
where c, is some constant. For variations in œ the maximum value of w(x) occurs 
when cos х = 1 and we may write 

v(x) = cy (1— $(cos «— 3)*)*. (47.44) 

Thus, for cos «—} = e, say, the ordinate of w(x) tends to zero, as k increases, compared 
with the ordinate at cos х = 4. For continual iteration, therefore, the resultant series 
tends to a periodic wave with period 27/arc cos } = 6. 


Example 47.6 Slutzky’s theorem of the sinusoidal limit 
Take a moving average of two of a random series times, and take the mth differ- 


ence of the result. Then, if n—>oo such that m/n tends to some constant 0 between 
О and 1, the series tends to a sine wave with wavelength A given by А = arc cos 120. 

Taking the mth difference is equivalent to taking first differences m times. Hence 
the autocovariance generating function of the resultant is given by‘*) 

T(z) cc (1+2-1)"(14+2)"(1—2-1)"(1—2)", 
and hence, putting z = е“, we find 
w(x) oc (1— cos «)"(1+cos a)". (47.45) 

We can evaluate the constant from the relation 


fo dx = л 


аала? 
and find w(a) = 2" T(m+4)T a+) (1— cos о)" (1+ cos а)". (47.46) 
The maximum value occurs at « = ш, when 
osa = #—” = 15. (47.47) 


The theorem then follows if we show that w(x) —> 0 everywhere except at this maximum; 
and further that the series is not only periodic but a sine wave. 
Using Stirling’s approximation to the Г function, we find 
~ — Вя) (mtn) Se ^ 
w(x) 297» (m— 3 (n 3* (1— cos a)" (1-- cos a)". 
If cosa = {(n—m)/(n+m)}+e this tends to 


(*) The symbol Г with z as argument is used here to denote the autocovariance generating 
function. For other arguments, it denotes the usual Gamma function. 
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+ 
as (m+n)! (Jy ( xe. (47.48) 


2m 2n 


For e = 0 this tends to infinity. For ғ 4 0 it will be seen, on taking logarithms 
and expanding, that the expression tends to zero uniformly in any closed interval 
excluding ғ = 0. Thus w(x) has a single infinite ordinate at a, given by 
arc cos {(1—6)/(1+9)}. 

W(a) is accordingly a step function with a single step from 0 to л at that point. 

It follows from (47.22) that the autocorrelations of the resulting series are given by 

pj = COS aJ. (47.49) 

Consider now a given stretch of the derived series, say ty, . .. Uy for fixed N as 

n—> оо. We have 


N-2 
E 27 (ича —2рщъл + ш)? = 2(N — 2) var u{1—2pi +s}, 
which, in virtue of (47.49), becomes in the limit 
2(N—2) var u(1—2 cos? a, -- cos 290) = 0. 
Hence in the limit 
щза—2рущд+ = 0. (47.50) 
This is a difference equation of a sine curve. 


For some generalizations of Slutzky’s result see Romanovsky (1932, 1933) and 
Moran (1949). 


47.16 If и, is a stationary series the moving average 
Gi = Potit Bitiat et Baca (47.51) 
is also stationary. In particular, if и, is a purely random process with zero mean the 
autocorrelations are given by 


һј 
ABB 
рр = y, fish, 
x 
0 
- 0, jh. (47.52) 


We have already seen in Example 46.7 that the correlogram may present an oscillatory 
appearance for such an average. 


47.17 Wold (1938) has proved a theorem on the conditions under which a specified 


set of constants ру, Pa ..., p, сап be the autocorrelations of a moving average of а 
random series. "Take the generating function 
G(z) = 1-4 pi(z- 7)... py (2^ +27). (47.53) 
Put 
у= #+2-1. (47.54) 


This will transform G(z) into а polynomial of degree h іп у, say H(y). Then, for 
the p’s to be autocorrelations of a moving average of extent h+1 it is necessary and 
sufficient that H(y) has no real root of odd multiplicity in the interval —2 « y < 2. 


416 ТНЕ ADVANCED THEORY OF STATISTICS 


For example, suppose p, # 0 and all other p’s vanish. 
G(z) is then 1+p,(z+271) and 
Н(у) = 1+ ру. 
This has a root of odd-order multiplicity (unity) equal to —1/p,. This will lie in the 
interval —2 to +2 unless p; < 3. Thus, no moving average of extent 2 can have 


ру > $ Pa = рз =... = 0, as is otherwise evident. 
We have to determine the f’s in the moving average from the relation 
Li h h 
G(z) = X р(21+2-) = X Вх) X мыл 47.55 
(2) EAR ) ue gus ( ) 


"There will, in general, be 2^ solutions (for, on identifying powers of z, we shall have 

h equations each of which is quadratic) However, only one of these gives roots of 

G(z) = 0 which lie inside the unit circle, and in virtue of another result to which we 

refer later (47.18), this is the only acceptable solution. From (47.54) it is seen that 

for any y, the roots in z are given by 
2%—2у+1 = 0 (47.56) 
and hence, having the product unity, lie one inside and one outside the unit circle, 
From (47.56) we have 
z = уф —1). (4757) 

Three cases arise: 

(a) H(y) has a complex root y,. Then the conjugate у! is also а root. Thus the 
corresponding quantities 2, 2;!, 2}, (21)7! are roots of G(z) and thus one of 
(2—21) (2—21) and (z—2; !) (s— 21!) is a factor of X 8,27. 

(b) H(y) has a real root 22 in modulus. Then, from (47.57), z, and 27! are both 
real. Опе must be a root of X В; 2? = 0 and this case then corresponds to real 
roots of X fj, 2! = 0. 

(c) H(y) has a real root <2 in modulus. In this case z, and тү! are conjugate com- 
plex and of modulus unity. The factors 2—2; and 2— 27! are therefore both con- 
tained in Х 3, 22 and E fj 2- and therefore the root must be of even multiplicity. 

The theorem follows. 


Autoregressive series 
47.18 Consider now a series defined by 


Uy = —0U, 19373... ашр (47.58) 
which (putting «, = 1) we may write in the more convenient form 


h 
E ty =e (47.59) 


Here e, is a random variable and, unless otherwise specified, we shall suppose that 
successive values of ғ are independent and all have the same variance. 
If D is an operator such that Du, = u- we may write (47.59) as 


(Z a Du, = ey 


STATIONARY TIME-SERIES 417 


giving the formal solution 


1 A 
щ = XD" = (ZB; Dye, 
= АСЫ (47.60) 
where the constants В are related to the «’s by the identity іп D 
leng d 
жа!” = f, Di. (47.61) 


However, this is not the complete solution of the difference equation (47.59). Let 
гъ Za ...,2 be the roots of 
а®+ж,з%-1+...+@, = 0. (47.62) 


Then the general solution of 


У ошу = 0 (47.63) 
qtd 
may be written 
h 
u, = DM (47.64) 


where the A’s are arbitrary constants. 

We shall now assume that | z; | « 1 for any j, namely that the roots of (47.62) all 
fall within the unit circle, and that they are all different. Then for large t the solution 
(47.64) damps out of existence. 

The series (47.59) is regarded as having “ started up” a long time in the past. 
Then the contribution to the solution (47.64) has disappeared and the complete solu- 
tion is, in fact, the particular solution (47.60). 

We shall call a series of form (47.58) autoregressive. It is a type of moving average of 
infinite extent. If is ergodic, then so will be ш, provided that È В? converges. This 
proviso is, in fact, satisfied, provided that the roots of (47.62) are all within the unit 
circle (cf. Exercise 47.19). 

In practice it is rarely necessary to discuss the roots of equation (47.62). J. Wise 
(1956), however, has shown by using a theorem of Routh, that the conditions concerning 
the roots can be expressed as algebraic conditions on the «’s themselves. 


47.19 It will have been observed that и, is dependent on £p &,.,, etc., but not 
On £j, ёра etc. Let us multiply 


Eau, = 8 (47.65) 
by шул, and take expectations. We then find 
Pets Peat--++%Prr=0, k > 0, (47.66) 


aset of equations due to Yule (1927) and С. Walker (1931). In particular, since p_; = pj, 
we have 

ру+®+®ару+...+®рд-у = 0 

Pots pi aat... +arpr-a = 0 
and so on. 
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If we multiply (47.65) by tim А20, and take expectations, we obtain 


var = 
Pit a Pat +++ FOR Pein = ru (47.67) 


where the f’s are given by (47.61). These equations are due to Wold (1938). 


Example 47.7 The Markoff series 
Consider again the series of Example 47.2 This is the simplest case of an auto- 
regressive series (apart from the trivial case Л = 0): 
UU, = Ep 
which, for convenience, we shall write as 


Uy— ply, = Ep (47.68) 
This is known as a Markoff series. 
We have 
1 = 27)2 
E 1+pD+p*D*+... 
and hence В, = p. (47.69) 
Непсе уаги = var {х Z2 
=0 
=vare X p? 
о 
vare 
=т=. (47.70) 


From the Yule-Walker equations (47.66) with Л = 1 we have 


Ру Рру-1 = 0, ј>0, 
and in particular 


ру = р. (47.71) 
(This is, in fact, why we named as p the parameter —2,.) Further, 
рь = p (47.72) 


"5 


TO 


А w(a) 


4 0 « п 
Fig. 47.4—Correlogram (left) and spectrum (right) of the Markoff series 
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For the spectral density we have 


= 1 1 
2 бура € 
al) AP Y Е ое Ере 
1 1 
Bape p= 
— р? 
Зарар (47.73) 


1—2p cos a-- p* 
The correlogram and the power spectrum are shown in Fig. 47.4. 


Example 47.8 The Yule series 
The next most complex form of linear autoregressive scries is known by the nama 
of Yule and is given by 
UT oy Mya aq = Ep (47.74) 
From the first two Yule-Walker equations (47.66) we have 
Pitty +p: = 0 
Pat % pits = 0, 


giving 
ce RA 
к= (47.75) 
AU (47.76) 
* a" +a 
or am AT (47.77) 
1-pi 
1-p. Papi 
Жегу ы её. Лей 47.78 
i 1-pi 1-pi bis 


"These equations give the parameters a, оз in terms of the first two autocorrelations 
and vice versa. More generally, if и, > are the roots of 
x+ x+ = 0 
then ру = Api + By’, 
subject to initial conditions 


pPo=1=A+B 
pı = Au Br. 
We then find 
ea ere ee 
н usi 0179-70-891 (47.79) 
We can put this in a slightly more convenient form. Put 
p= ре, v= pe”. (47.80) 
We find 
ЖУ, себе (47.81) 


2| Vol’ 
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and (47.79) reduces to 


= Pisin 0+) (47.82) 
sin y 
where 
pt 
tan y — i-i tan б. (47.83) 


'The spectral density function is given by 


P |... d -eg)(1— et a$ 224) 4 Е 
VR (1+„){1-+«{+ — 22 + 2a (12-24) cos a+ 4a, cos? aj " а) 
Fig. 47.5 shows a typical correlogram and power spectrum. 


10 


ше) 


c 


Fig. 47.5—Correlogram (left) and spectrum (right) of the Yule series 


Example 47.9 

We return to the discussion at the end of 47.18. In a linear scheme of the auto- 
regressive type, if the roots of the characteristic equation do not lie inside the unit 
circle, the process is not ergodic. If some lie outside the circle the series “ explodes.” 
It may still oscillate, but with ever-increasing amplitude. 

If a root lies on the unit circle, the general solutions do not damp away, but provide 
harmonic terms. The particular solution is then a “ wandering series.” Consider, 
for example, the simple case и, = uw, ,--e;. Clearly the particular solution is 

o 
щ зч 
and the variance of и increases without limit. 


Example 47.10 

Consider the Yule scheme in the limiting case x, = 1. The coefficient p of (47.81) 
then is unity and the correlogram (47.82) does not damp. The system, in fact, ceases 
to be ergodic. 
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47.20 The type of series exemplified by 
Ug = it En (47.85) 
where £, is a deterministic harmonic term, may be regarded as a harmonic with super- 
posed error. It is sometimes known as a scheme of hidden periodicities. 
There is a somewhat different type of process to which the name harmonic is some- 


times given, though it is of no practical importance. A series may consist of the sum 
of a number of harmonics, say 


ш = X Ay cos (a;i), (47.86) 


where the o's are fixed but the A's may vary from one realization to another. In this 
case there will be certain linear relations of the Yule-Walker type 


h 
È ар = 0. (47.87) 


Continuous series 


47.21 Up to this point we have been concerned with series which are defined or 
observed at a set of discrete points. Some series, as we noted in Chapter 45, have a 
continuous existence in time, and there are even situations where we can form a con- 
tinuous record, as for example in the devices which graph temperature on a rotating 
drum. The fact that matter is ultimately discontinuous (if it is a fact) does not prevent 
us from regarding this record as continuous. 

For series which are defined by deterministic continuous functions, such as poly- 
nomials or trigonometric functions, this correspondence between the assumed con- 
tinuity of reality and the defined continuity of mathematics rarely causes any conceptual 
difficulty. But when we come to series of the stationary type in which there are jumps 
between successive points, expressed by random variables, we must consider this 
question of continuity more closely. Can we, in fact, have a continuous series which 
proceeds by random jumps, however small? Our own opinion is that we cannot; 
that there is something essentially antithetic between randomness and continuity. Any 
tendency, then, to take the mathematician’s customary leap from the discontinuous to 
the continuous case must be carefully controlled. It may well prove possible, of 
course, to approximate to discontinuous expressions by continuous ones, for example, 
to represent sums by integrals; but we must not forget the problems of interpretation 
which are involved. 


47.22 To deal with this subject rigorously requires a theory of stochastic integration 
which would take us beyond the scope of this book. But we may expound the basic 
results in an intuitive way as follows. 

Consider a continuous series u(t) defined in some interval —h to h. Taking the 
mean to be zero, which does not seriously limit our generality, we may define the vari- 
ance as 

1 


h 
= eu 2 
хаги = yr E (t) dt. (47.88) 
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If this has a limiting value as Л tends to infinity (as it will for a stationary series) the 
variance is defined over an infinite range. Likewise we have an autocovariance function, 
and if we standardize by division by var u, we have the autocorrelation function 


{Д 
oe) = lim 2 u(tju(t+ k) dt. (47.89) 
s 
Consider the transform of the autocorrelation function, say ¢,(p), defined by 
elie сы КГ Г ikp 
4) = a | Реа = lim g f | 046+ dt? dk 
А h 
= lim af f u(f)e-*?' u(t-- Rem dt dh. (47.90) 
4h? =J —h 
Putting q = t+k, we reduce this to 
H 1 Е 2 —ipt ipa 
lim ИСУ u(q)e'?* dt dq 


FA 1 А —ipt : ipa 
= lim wel rO "а y u(g)é dg. (47.91) 
Hence, if the transform of the series u(t) is given by 
alp) = над. | : u(t)é?' dt = a(p)+ib(p), (47.92) 
-h 
we have, on letting / tend to infinity in (47.89), for the transform of the autocorrelation 
function 
$, (P) = a°(p)+5*(p) = | 4,0) I. (47.93) 


¢,(p) is the continuous extension of the spectral density which (cf. (47.21)) is the 
Fourier transform of the autocorrelations. 


47.23 It is to be noted that, even for a continuous function defined over an infinite 
interval, the autocorrelation function does not determine the series u(t) uniquely. In 
fact, given $,(p), we have from (47.93) 


$«(2) = | pp) he (47.94) 
where is any arbitrary real function. We shall then have, on inverting for u, 


wt) =; | sete 


= | eO (47.95) 
Since u(t) must be real, the imaginary part of the integral vanishes and we have 
w(t) = z^ ув, cos ui) dp, (47.96) 


a result due to Wiener (1930). Hence и must be an odd function of p but is otherwise 
(subject to convergence) arbitrary. Hence ¢, does not uniquely determine u(t). We 
shall consider this from the point of view of spectral functions later. 
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Example 47.11 
Consider an autocorrelation function of the type we have discussed for the Yule 
series at (47.82), 
_ p* sin (kô +y) 
pues sin y y 
Consider what happens if we regard this as defined for all А, not merely integral values. 
We may, with a slight change of notation, put 
_ e* sin (А0 +у) 
(a) ^o snyg ” 
When k is negative we must use | А | in this expression. 
For the transform we have 
_ (2 e*l sin (А+) irp 
во) = |" tS ende 


250: (47.97) 


= 4 q 

qp +O- Secus 
The variable p in the transform here is not to be confused with the damping factor 
p in plk). 

It is to be noted in (47.98) that this spectrum is continuous with a maximum at 
p = 0. The physicist would be tempted to regard this spectrum as analogous to 
that of white light, every frequency being represented. It does not follow, however, 
that the series u(t) arises as the sum of a large number of harmonic terms with all 
possible frequencies, in the way that white light can be regarded as the composition 
of a number of resonators oscillating on all wavelengths. 

On this question of white light, let us consider the limiting case when a time-series 
is defined at a series of small intervals At and all autocorrelations are zero. We then 
find, from (47.21), that the spectral density (on a scale with unit time-intervals) is 
unity, or on a scale At would be approximately At/2z, namely a constant. Certain 
physical systems do give rise to constant spectral densities, or to a series of equal 
ordinates very close together. "The communications engineer describes the situation 
as one of white noise. Since he is trying to transmit signals on a determined frequency 
this so-called noise is a nuisance (like that in a radio set) which affects his reception 
and acts as an error-like disturbance to the purity of the incoming signals. 


Filters and transfer functions 
47.24 Suppose that we have a series u(t) and a system of weights a(t). We may 
form what is, in effect, a linear weighted average v(t) by the formula 


ot) = [129-9 dr. (47.99) 


This average is over past values of u(t), including the present value, and does not look 
to the future. If u(t) is defined at discontinuous points a similar sum may be defined. 
For the spectrum, we consider 


f 5 денй = [7а [^ ult—1) dt dz 
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= | де” КО 
- f aoea Í дена 
Hence, taking the squares of the moduli, 
I (9)* = Iulo) | [eas |. (47.100) 
The function 
то) = | ё=д)& (47.101) 


is sometimes called the frequency response function or transfer function. It is, in 
essence, the c.f. (Fourier transform) of the weighting function а(т). (47.93) and (47.100) 
show that the spectral density of the derived series is obtained from that of the original 
series on multiplication by the square of the modulus of the transfer function. The 
engineer would regard the system as an incoming series, u(t), modified by some 
mechanism equivalent to a linear average, to give the output v(t). 


47.05 Within limits, it is possible to choose the transfer function so as to produce 
from u(t) an output v(t) with emphasis on particular frequencies. Such a function, or 
rather the set of weights а(т), is then called a filter. This is not the happiest expression, 
a filter removing impurities by withholding them, rather than transforming them; but it 
will serve. We need not, and shall not, confine our usage to averages which extend 
over the past, as in (47.99). Thus, the ordinary moving averages which we considered 
in Chapter 46 are filters in this sense. 


Partial autocorrelations 
47.26 If we write the linear autoregressive scheme in the form (47.58) 

Uy = — t1 Uy — toUte. ори рф; (47.102) 
we may regard it as a kind of predictive equation for u,, which will then depend on two 
factors, the systematic terms in v, ; which, as it were, express the effect on и, of its 
own past history, and the random element e, which can be considered as a disturbance. 
We may then ask the questions which are usually posed in ordinary regression analysis: 
given the autocorrelations p;, what are the partial correlations expressing the dependence 
of u, on previous terms when the effect of other intermediate previous terms has been 
removed? 


47.27 Consider first of all the Markoff scheme (47.68), 
Uy = puta ttg (47.103) 
We know from (47.72) that p, = p*. Let us calculate the partial autocorrelation ps, 
expressing the dependence of и, оп u,. , apart from the intervention of u,_,. We have, 
in an obvious notation (cf. (27.5)), 


Pisa = 5 Різ — Рла Рза _ 
(1-Р) (1 — ра)? 
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with Pis = p" Piz = Pos = р. 
Thus Pisa = 0. (47.104) 
It will easily be seen that, in fact, all the partial autocorrelations are zero. This is 
otherwise obvious from the fact that in (47.103) the regression of и, concerns only t 
as independent variable. 

In short, for a Markoff scheme all partial autocorrelations vanish. The term up 
so far as it depends systematically on previous terms, is entirely explained by u,.;. 


47.28 Now consider the Yule scheme (47.74), 
Uy = — Oy Uy_y— Ag My a EL. (47.105) 
We have from (47.75) and (47.76) 
REI 
1+ 


pic 
саа 
pa kai rr 


The partial correlation between и, and tpa is given by 


„ай 
раа = E = oy (47.106) 


“п 


as we might expect. 
We can easily check that higher-order partials vanish. For example, the numerator 
Of pias iS раз — PizsPi23. This in turn has a numerator 


(1— рї) (p3— рар) – (px — pa 1) (Pa— Р). (47.107) 
Considering the determinant of the first three Yule-Walker equations (47.66), we have 
a 1 р | 
ра юу 1 |=0. 
Ps Pa Pr 


This, expanded by the first column, shows that the expression (47.107) vanishes. 

Such results are, of course, obvious from the general theory of regression when 
we recall that the regressor variables and the regressand all have the same variance, so 
that the coefficients in the regression equation, being standardized, are equal to the 
partial correlations. 


Infinite, semi-infinite and circular processes 
47.29 If we consider the linear autoregressive scheme 

h 

È ашу = е; (47.108) 

j=0 
as generating a series of values of и, given those of ғ, we are faced with a difficulty, or 
rather, with the necessity for making a decision. We cannot find the value of и at 
some point, say T, without knowing those for T—1, T—2,..., T—h. Ме may sup- 
pose that these values are given, or otherwise known, for some То. From that point 
onwards the series is ascertainable and we may say that it is semi-infinite, because it is 
considered as extending to infinity in one direction, that of increasing t. 
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On the other hand, we may regard the series as extending back into the infinite past, 
as well as forward into the infinite future. We do not then know the “ starting up ” 
values, if any. The series may be said to be infinite. 


47.30 For certain mathematical reasons which will become clear, we may also 
define a circular process on lines analogous to those we have used in defining circular 
coefficients (45.34). We suppose, in fact, that for some № 


шх = Uy 
щъх+1 = Ша 


Ш+ъу+һ = Mera 

It is, so far as we can see, impossible to imagine physical processes which generate 
such a system. We shall therefore avoid it as far as we can. The best that can be 
said about it is that, should we be able to derive results for the circular process in an 
exact form, there is some expectation that, by letting N tend to infinity, we may derive 
at least approximations to the results for the semi-infinite case. But even this is doubt- 
ful; one does not avoid a difficulty by banishing it to infinity. We shall therefore need 
to be very careful in interpreting results for the circular process. 


47.31 The theoretical forms of correlograms exemplified in Figs. 47.4 and 47.5 
are not followed very closely by the observed correlograms of short series (“ short " 
in this context meaning anything up to 100 terms). Exercises 47.20-22 give the 
observed correlations of Tables 47.1, 47.2 and 47.4. Apart from irregularities such as 
might be expected from sampling effects, there are two other phenomena encountered 
in practice: (a) the serial correlations are biassed downwards, and (b) the correlations 
of higher order do not damp out for Yule and Markoff schemes as quickly as might be 
expected. A theoretical explanation of these effects will be given in the following 
chapter. The problem of how to fit schemes of various kinds to stationary series and 
how to test hypotheses concerning them will be considered in Chapter 50. 


EXERCISES 
47.1 In a stationary series with р, = p; = p, show that p>—4 and that 
4р%—р—1 
> —. 
Ps pti 
47.2 For the Markoff series 
ш = put-itet, 


show that the cumulants of u are connected with those of є by 
xu) = х(є)/(1— p"). 
Hence show that for the standardized coefficients Ar = xr/x2!", 
ә = мө CP. 
1-7 
Deduce that in general и is closer to normality than e, but that it is not normal unless ғ is normal. 
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47.3 Show that the Markoff scheme of the previous exercise can be written 
Ut = putent 
© 
with ne = —ри++(1—р?) X pec. 
j=0 
Hence show that if 7 is a random variable, 
var ņ = vare 
cov (75 tx) = 0, k 50. 


47.4 Verify that for the spectral density function (47.73) 
л 
f w(«) dx = л. 
о 


47.5 Verify that for the Yule autoregressive scheme equation (47.84) is true and that the 
integral of to(x) over the range 0 to л is equal to л. 


47.6 Show that there exist four and only four moving averages of a random series with 
correlograms 
42 4 8 
рі = "85° Ра = 17 Ps = — 85, 
These аге 1[8, —4, 2, —1], 3[71, 2, —4, 8], 32, 71, 8, —4], 1-4, 8, —1, 2]. 
(Wold, 1938) 


ра = Ps = etc. = 0. 


47.7 In the general autoregressive scheme with random term e, show that 


A 
var ( X =) = var £. 
-0 


47.8 For the Yule autoregressive scheme show that 
varu __ 14 
vare  (1—9) {(1+,)#—а1}` 


47.9 Show that the autocorrelations of the mth differences of a random series are given by 
m^ 
p; = C7 1f wpe 
47.10 If any series is fitted by a Yule scheme with autocorrelations ру show that the auto- 
correlations of the residuals, say оу, are given by 
_ @+1+офру+ (1+ а) (она + рул) grat Psa) 
1+@+ +2, (1+ 23)p1 +203 р; M 


9; 


47.11 Show that a series for which the autocorrelation function is 
vG) = (sin 2j)/4j 
has a continuous spectrum with a jump at the point A. 


47.12 Show that any linear autoregressive series can be represented as а combined sequence 
of Yule and Markoff series in which the error term e of one is the series-value of the next. 
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4713 If A, В are defined as 
Anay uj cos Ву 
Nj=1 
2 
В = Xu;sinf; 
nj-i 
show that, if и; = a sin at+b:, where b: is a component uncorrelated with sin at, 
Pe {= {in(a—A)} sin (A+ D(2— 5) | sin (n(a+A)} sin пае Ј 


п 


sin (3(x—5)) sin (M f)) 
with a similar expression for B. Hence show that S? = А+ В? remains small as n increases 
unless «—f is small, in which case S? = a*. 


47.14 For a “continuous” series with autocovariance function 
рф) = emalt 
show that the spectral density is given by 
171 
wo) = 


5 patpat 
(Cf. the characteristic function of a Cauchy distribution.) 


47.15 A series obeys the relation 
Ut = Utt Et 
where е; is a random series with unit variance. It is divided into consecutive groups of m 
terms and the arithmetic mean of each group determined, say as v;. Show that 
var (Avi) = (2m*+1)/3m 


and that 
Re ete ER 
il ie ана T 
(Working, 1960) 
47.16 Let U be the Nx № matrix 
013.3 оО 7. 10 
0: 0:430. шд 
OEGE as 206 
0900999 07 
0-00. Ар 000 
Show that successive powers of U are of a similar form with the diagonal of unities displaced 
to the right, and that UY = 0. Hence show that if u is a column vector (uj, 144.4... , ux) the 


autoregressive scheme may be written 


h 
= sts = є. 
=0 
Further show that for the dispersion matrix of u 
V(u) = (Z xU^)-1 (Z x;U'/)-1, 
(Whittle, 1951) 


47.17 Show further that the inverse of the dispersion matrix of U is given by 
V= = EU E aU’) 
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and hence that for a third-order linear autoregressive series with random errors the inverse 
of the autodispersion matrix is given by 


v-1= 


ГА CA mL 
14d a(l + aa) Oy +a аз * de: 
allta) 1+а1+°1 y+ Oy +@ 04 
®+ш% Oy toy a bua ty 1+01+ aa H ag 
etc. (J. Wise, 1955) 


47.18 Show that although the autocorrelation matrix of a series is of the Laurent type, 
its inverse is not. 


(Whittle, 1951) 


47.19 Referring to equation (47.63), show, by expanding the left-hand side in terms of 
partial fractions, that У f, converges if the roots of (47.62) are all different and lie within the 


unit circle. 


47.20 The following are the serial correlations of the data of Table 47.1 (wheat prices). 
Draw the correlogram. 


Order of | | 
correlation | Tk k Tk k | TE k Tk 
1 | 0:562 16 0-158 31 0-060 46 
2 0-103 17 0-109 32 —0-008 47 
3 | —0:075 18 0-002 33 | —0039 48 
4 | —0092 19 | —0075 34 | 0-007 49 
s — 0:082 20 —0-062 35 0-056 50 
6 | —0-136 21 —0:021 36 | 0-010 51 
z. —0-211 22 | -0062 37 | —0-004 52 
8 —0-261 23 —0-088 38 —0:015 53 
9 —0-192 24 | —0-084 39 | -0047 54 
10 | —0-070 25 | -0:076 40 —0-047 55 
11 | —0-003 26 —0-091 41 | 0-008 56 
12 —0:015 27 | —0-052 42 | 0-034 57 
13 —0:012 28 —0-032 43- | 0-065 58 
14 | 0:047 29 —0:012 44 | 0-099 59 
15 | 0-101 30 0-059 45 | 0-009 60 
47.21 The following are the serial correlations of Table 47.2 (marriage rates). Draw the 
correlogram. 
Order of | 
кореш | Tk k Tk 
1 0-563 11 —0-080 
2 —0-089 12 —0:136 
3 —0:498 13 — 0:132 
4 — 0:631 14 — 0:058 
5 | —0:467 15 —0-095 
6 | —0-025 16 —0-126 
7. 0-353 17 — 0-036 
8 0:396 18 | 0-131 
9 | 0-254 19 0-209 
10 | 0-104 20 0-205 
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47.22 The following are the serial correlations of Table 47.4 (artificial series), Draw the 
correlogram. 


Order of | 
correlation Tk k Tk k Tk 
1 0-70 11 21 | 0:05 
2 0-29 12 22 —0412 
3 0-01 13 23 —0:28 
4 —0417 14 24 —0:43 
5 —0-27 15 25 | —0-57 
6 —0-25 16 26 —0-56 
zi —0413 17 27 —0:26 
8 0:07 18 28 0-02 
9 0-12 19 29 0-17 
10 0-05 20 30 0:27 
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Large-sample theory 

48.1 We defined the serial correlation of lag k in 45.32 and remarked that, for 
certain purposes, simpler forms of definition were mathematically and computationally 
more convenient. For large л the definitions tend to equivalence. For large sample 
theory we shall therefore consider the standard error of the form 


=c/v, say. (48.1) 


As usual, we may write parental or sample forms indifferently in the resulting expres- 
sions, and shall usually employ the autocorrelations pj. 
In accordance with the customary procedure we have 


var ry = pace Zeca c? varo 
о? v? ps 
and, taking v — 1 without loss of generality, 
var r; = var c— 2с cov (c, v) + c? var v. (48.2) 


To evaluate this expression we will derive a general result concerning the covariance 
of two covariance terms. We have 


1 1 ih 
cov £ E atlass Я x ОО, = pee Ua Us ss È Uy Uy за) — Ps Ps+t 
1 
= a (Uatlasstiptpss4t)} —PsPs+t (48.3) 
To evaluate the product-moments of order four in this expression we need some 
further assumptions. Assume that the w’s are jointly normally distributed so that their 
characteristic function is of the form 
exp{— 405+ Goss + 05-01... 2p, 0,04. etc.) (48.4) 
For the coefficient of 0,0,:.0,0ь:+: we find 


PsPs+t* Po—aPb—a+tt Рь-а+в+1Рь-а—в* 
On summing over a, b, we find, for the covariance (48.3), 


© 


1 а 
ai [ера numen „РЕН X ныр] —PsPs+t 


1 
e „© PiPi+t+ раар} (48.5) 
4з1 
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Now let us specialize. Putting s = t = 0, we have 
vare = 25 pl (48.6) 

Putting t = 0, we have 

vare = 1 È (рар). (487) 
Putting s = 0 and replacing t Ьу з, we have 

cov (n) = 2 € pps (48.8) 
Finally, substitution of these values in (48.2) gives us 


eo 


1 ° 
vary = 5 E [ирзои etg]. (48.9) 


The formula is due to Bartlett (1946). 


48.2 This result shows us that, even for large samples with the simplifying assump- 
tion of normality, the variance of r; depends on all the autocorrelations of the series. 
This is awkward, for we cannot estimate them all directly from a finite series. Some 
fair approximations can, however, be derived in the manner of the following examples. 


Example 48.1 
Consider in the first place the simple case when all parent autocorrelations are zero 
(a random series). We then find, from (48.9), 


varrj = L, (48.10) 
This verifies that the variance is of order 1/n. In fact, the sampling formulae in this 


case reduce to those of an ordinary correlation coefficient in bivariate normal samples, 
as is evident from the fact that the series is random. 


Example 48.2 
If p; and subsequent p’s are small, (48.9) reduces to approximately 
1 45 
== = рі, d 
агт = = ЕДВ (48.11) 
It may be verified (we leave this as Exercise 48.1) that on similar assumptions 
1 
соу (7, 7541) = = > ГҮЛҮ (48.12) 
Example 48.3 


If the series obeys the Markoff equation (47.68) with parameter р, we have from 
(48.11) and (47.72) for large j 


varrj = Hl Ў p") 


114p? 
оп 1-p* 
For more exact values for small j, see Exercise 48.8. 


(48.13) 
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Example 48.4 
The approximate forms of (48.11) and (48.12) can be derived direct from the auto- 
correlation generating function. For, in the notation of 47.14, 


w) = Ge) = È ps, 
and hence 
(Gay? = Ё Ера = EAE pipa) (48.14) 


so that С? is a generating function for the sums required. 
For example, with the Markoff process 


1 1 
OG) = cies 


G*(s)  1--(1— pz)-*- (1 pa71)-* — 2(1— pz) 1 X(1 — pa) 1+ 2(1 — pa) (1p), 
We then find for the coefficient of 2° 


1+1+1-2-2+2(1+р°+р#+...) = 1 
as at (48.13). Also the coefficient of 2* is given by 
(b+ 1)p*—2p'+2pttptt2+...) = e). 


Hence the covariance of ғ; and r;,, in a Markoff scheme is, by (48.12), 


je f Жошы 

1, fr 1723) (48.15) 
and the correlation between them is, using (48.13), 
p* {(k+1)+(k-1)p%} 

i (48.16) 


The method may be extended to schemes of higher order (cf. Quenouille (1947a) and 
Exercise 48.14). 


Bias in the estimation of autocorrelations 
48.3 If ris of the form 4/4/(BC), we have, writing a, b, c for deviations from means, 


ees Е(А)+а 
! 7 (CB) -b) (E(C) +] 
and expanding in binomial series to the second order of approximation, we find 
Же {1 _ Ба) _ Еа) , Еф) 
? = (EB)E(C)!| 2k(4)E(B) 2E(4)E(C) 4E(B)E(C) 
3E(9) | 3E(c*) 
ВЕ (В) BEC) 3 (48.17) 
i Uum 1 п-ј \2 
Putting B- mdi 4 ау) (= u) (48.18) 
Ж) n-j 2 
РА aye ducens (ms) (48.19) 
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so that E(B) = E(C), we find, on a little reduction, using the asymptotic equivalence 
of B and C, 

E(r;) = LM — —рз, + ; (48.20) 


Let the variance of the series be unity and write » = n—j. Then we have, on taking 
expectations in (48.18-19), 


ЕВ) = Ec) = + {r-1-2' a. (48.21) 
i=l 
Now, taking 
1 n-j 1 n—j n-j 
Бы MES щ 2 Ug (48.22) 
we find 


1 191 Ё 14 E 
BA) = (5 E 0o; $ o-do- 


»—j- 
-; ® e») je (48.23) 


We have evaluated in (48.6) and (48.8) var b and cov (a, b). Substituting for the various 
quantities in (48.20) we find E(rj). We shall not write this out explicitly, but shall 
consider some particular cases. 


Example 48.5 
If the series is random, р; = 0 except for ро = 1. We then find from (48.20) 


E(r) + -L (48.24) 
It so happens that in this case we can evaluate the bias exactly by using the definition 
n—j 
У (uj 2) (и; 
1 
= — = —. 
Put 


'Then 
E(r) = — Е d 


] 
= 
а 
Di 

Ж 

+ 


1 p(z) -Ez 
Кес e сте 


1 
- т (48.25) 
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since Ez, = 0. This agrees with (48.24) to order n~. It is rather remarkable that 
there is a downward bias in r; even for a random series. 


Example 48.6 
If the series is such that 
p-p р; = 0, ј + 0,1, 
we find from (48.20) 


Etr) = p+ \(1+0)(4e*-2e—1) (48.26) 
EG) = -H1 +2p+2p°) (48.27) 
E(r) = -11 +2), j>2 (48.28) 
Example 48.7 
For the Markoff series (47.68) with parameter p we find similarly 
Er) = p- 1539 (48.29) 
EQ) =p- * Ра o9+ 2h, "ge (48.30) 


The bias in all these cases is downwards and obviously may be quite serious. For 
р = bina Markoff series of 25 terms the mean value of r, would be about 0-4, not 0:5. 


Quenouille's correction 
48.4 In the manner of (40.28) we may use the simplification of Quenouille's 
method of removing bias by splitting the series into two. If r is the serial coefficient 
for the whole series, rq) and ri) those for the two halves, we use 
R = 2r-Yra tre) (48.31) 
which will be unbiassed to order n-*. 
For some further results on bias see Marriott and Pope (1954), Kendall (1954), and 
Quenouille (1956) (cf. 17.10, Vol. 2). White (1961) has obtained some results for the 
Markoff case to order n~*. 


Some exact results 

48.5 We now proceed to consider some exact results in the distribution theory of 
serial correlations. As we might expect, exactitude has to be purchased at a price, 
usually that of assuming normality in the parent series, but occasionally, also, of simpli- 
fying the definition of the statistic under investigation. We first of all derive some 
results (due to Moran) by the method of expectations. We then obtain some distribu- 
tions (due to R. L. Anderson and a series of later writers, notably Daniels) which 
raise some quite new points in distribution theory. 
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48.6 Consider the case of zero autocorrelations and first serial correlation defined 


n—1 
by " E (u-i) (uis 2) 
a (48.32) 
У (u;— i) 
i=1 
We have already shown at (48.25) that 
E(r) = — 1/(n-1). (48.33) 
Put 1-91, Zi = щ-й, 
so that E(I) = —(1/n). 


n=l 
У mW 6 
We have E(I) = E EE 
А n—1 n-2 
= [= a Хы E sslazatE nsa ал) (48.34) 
1 


where i, i--1, k, А+1 are all distinct. Thus 
E(I*) = E[(& 22)? ((n—1)21 224-2(n—2)21 а» г, + (n—2) (n—3)z, 2,25 24}] 
or, in terms of the augmented symmetric functions (12.5, Vol. 1), 


ЕІ) = E [ DE {i D e jas 5 n). (48.35) 


Using Appendix Table 10, we express the augmented symmetric functions in terms of 
power-sums, obtaining, since (1) = 0 here, 
il (4) 2 (24) ) 1 6(4) | 
Bc == : =I = ч Е 
n - ат or tae Oop n 
Now in normal samples k,/kj is independent of Ki (cf. 37.27) and, using (12.28), 
hy _ n-1 Due. } 
c^ (n-2)0—3) n(n 4-1) 0): 3(п— 1). (48.37) 
Hence, since the sample is normal, the expectation of А,/Ё$ vanishes, giving, from 
(48.37), 


(4) _ 3(n-1) 
Zr - 80). (48.38) 
Substitution in (48.36) and reduction then gives 
?—3n43 
E(I) = mx к (48.39) 
and hence іп the normal case 
var I = (Soe 
*(n-1y 
—2)% 1 1 
var ry = ain = 17+ °( (48.40) 


Since n(4)>(2)? by Cauchy’s inequality, E{(4)/(2)?} 2n-* for any parent, and 
(48.36) gives quite generally, as Moran (1967) pointed out, 
var rg! 0-2) ud) (48.41) 


(n-1?  n—1 л, 
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The normal value (48.40) almost attains this maximum. For long-tailed distributions, 
(4)/(2)? will be large and var r, correspondingly small. 

Moran (1947-8) gives results for the circular definition and also derives the third 
and fourth moments—cf. Exercises 48.6, 48.12. Jenkins (1954b, 1956) has used similar 
methods for the joint distribution of serial correlations (cf. 48.26 and Exercises 48.18 
and 48.19). 


R. L. Anderson’s distribution 
48.7 For reasons which will become evident later, we now consider the 
distribution of the first circular serial correlation 


a Mata ttle Ms oes ++ ш mE (48.42) 


2 (u,- 2)* 
Following R. L. Anderson (1942) we consider the distribution of this statistic in samples 
from an independent normal series. We now drop the suffix to r. 

We shall seek a linear transformation to variables &, (i = 1,2,...,m) such that r 
transforms to E4,52/Z ё. The point about using a circular definition is that we 
can determine the A's explicitly. 

Any orthogonal transformation will transform the denominator of r to the required 
form. The numerator of r is equal to q, say, with 


деин (1-2-2 psi (48.43) 


n 


Consider q-Aduj. (48.44) 
As in the case of principal components (cf. 43.6), if we determine А so as to maximize 
this quantity, we shall arrive at the sum X 4,57, as desired. We do not need to find the 
transformation: the A's are all that we require, and from (48.43-4) they are the roots of 


He3ie3 2 4 3 


A 0169-2 ч 
Е 3(1-3) -(05) -- -1 A =0. (4845) 


Іа а, a, | 
а а . r 
D= ада, ... а-а (48.46) 
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Let оу, Os . . +) On- ©, (= 1) be the n roots of unity. Multiply the jth column of 
D by oj-! and sum for j. Then it will be seen that 2 a,cj-' is a factor of D. This 
is true for any k, and hence 


2 П X aot (48.47) 
k-1j-1 
Now 
Zoli-0,kzn 
ј=1 
= de k=n, 
n-l 
and thus Xoj!2n-3, Е = п, 
j=3 
= -(Lreyrog), РИ» (48.48) 


Putting the appropriate values of the a’s, from (48.45), in (48.47), and using (48.48) 
we find, on some reduction, 


D = Xil {-24 Hot 074), 


and, since }(@,+@;1) = cos ЛЕ, we have 
Toe т п(- itos =). (48.49) 


Thus the roots are 4 = 0, å = аз, k-1,2,..,-1. It is crucial to 
observe that the A-roots occur in pairs, сей perhaps for one which is unity. These 
paired terms may be put together to give v, (= sum of squares of two °). Thus we 


have 
in-n 


а= XA n odd, (48.50) 
i=1 
1-2) 

= X Д©—®, n even, (48.51) 


where v, is distributed as у? with 2 d.fr. and v with one d.fr. The denominator of r, 
say р, is distributed as 7? with n—1 d.fr. Our problem is then reduced to finding the 


distribution of 
Mn) 
r= X Av/Xv, n odd, (48.52) 
ici 


—(ZA;—v)(Ev-v) п even. (48.53) 


48.8 The distributional problem was solved by К. L. Anderson (1942) to whom 
the foregoing is due. We will illustrate the method on the particular case л = 6 and 
then quote the general result. 

We have, for n = 6, 

_ Җ®+3а9,—0 Ф, 
tn гост Фа ТЕН, (48.54) 


2n 1 


1 
where А = cos — QU: 2 = = (48.55) 
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The joint distribution of the v's is given by 


dF(v,, va 0) = qti ing-ine-indo dv, dv; (48.56) 
= Ry v-t e-i? do do, dv. (48.57) 
We have to consider two cases, 
Ag&rEA, 
and —1<г<3,. 


Note from (48.52-3) that r<A,, the larger of the roots in 2. Consider the first case. 
From (48.54) we have 


® = pty Una -1) 011 +4)} (48.58) 


v, = ze (Pelra) 1+) (48.59) 
17^2 


The Jacobian of the transformation will be found to be 
(0,9) _ Ps J 
д(ф, por) | h-A 
Hence, from (48.57) the distribution of фе, r and v is given by 
e 1 ope in 
dF = ПУ, hh dp, dr dv. (48.60) 
Integrating out for v, noting the limits determined by (48.58) and (48.59), we have 
1 pue г, (25) 
dF = лі м< p» (кө; dp,dr 
1 
© 222) (4 — 23) (1+4) 
Finally, integrating for pẹ from 0 to co, we obtain 
3 (=r 
dF(r) = 5 dr, àj&r&A. 48.62 
Шыр 9765 с) 
For the other part of the range we find similarly 


Ко алу 43 ду 3 
ано) = Fa day Зарып] SS (em 


p? eir (4, —r) dp. dr. (48.61) 


48.9 It is typical of these distributions that they split into separate analytical 
expressions for values of r between the critical roots 2. The frequency curves are 
continuous, but the derivatives not necessarily so. It is also typical that the distribution 
functions are easily written down. For example, corresponding to (48.62) and (48. 63) 
we have 


Prob (R>r) = aint jy Bee» (48.64) 
(=)? (2—2)? 


Nr кеш v 
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We quote the general form of the complement to the distribution function: 
ron» 
Prob (R>r) = t Ei et) sed Anit<1<4m, n odd, (48.66) 


dea =D 


I. (4-2) 


j=1,ižj 


-SEA ES 


n A AA 


j=1,ižj 
The frequency function falls into 3(n—1) or }(n—2) pieces according to the parity 
of п. 


› Ала", п even. (48.67) 


48.10 We shall have to pass over rather cursorily a number of features of the 
distribution which are of mathematical rather than statistical interest. 


(a) For r, with Z not equal to unity the circulant has factors typified by 4,—cos (27/k/n). 
If l is prime to л the circulant is the same as before and the distribution remains 
unchanged. If not, it would seem that the analytical form is different. 

(b) There are other ways of obtaining a circulant without assuming a circular definition 
of the coefficient. For example, with n = 2m, 

My Ua ... E M aU Mg aaa ss tpi Un 
Zu? 
will be found to have the characteristic property of paired roots in 2, and therefore 
follows Anderson’s distribution. Other cases are given by Durbin and Watson 
(1950-1). 


48.11 We proceed to derive the characteristic function of д and р. Taking 
temporarily s, ? as the dummy variables, we have for the joint c.f. of д and p, the 
numerator and denominator of r, 


(s, t) oc [е [-3 (E и*—2й E (и – 2)? —2is(u u - . . . +p t, —nü?)}] du 


= A-t (48.68) 
where 
(1-2 (1-1), ite 2 | 
n) n n n n 
A= ЛЛ a ао | (48.69) 
ced nage 1-2й (D 
| n n n n n | 
This is the same kind of circulant which we had at (48.45) and reduces as at (48.49) to 
-1 
А= їй {1-2 (+: сов =). (48.70) 
k=1 
Taking logarithms and identifying coefficients we get, for the cumulants of q and p, 
a a PoI 26-1 fe i 2лЬ\* 
=й E ЕЛ == 
= = rm T ( t ) (cos а ) (48.71) 


-2£HE (ccs =) Mei (48.72) 
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Now we have 


Z cos 2 Sees cost Zak teg cost 21 cuum (48.73) 
х cos? 214 10.22) 2 cost 22h = qon-9. (48.74) 
Substituting, we find 
Kg = —1; kg n1—-l; e —n-2; ку = —2; Ko 2(n-1); 
ЕЕ 2), wn m 8) ra= (n1) (48.75) 


It will be seen that xz and коз are of order n, whereas higher order cumulants are 
of no higher order. Thus, in standard measure, the higher order cumulants tend to 
zero, and the distribution accordingly tends to bivariate normality. Moreover «, in 
standard measure tends to zero, so that q and р are uncorrelated and hence, through 
normality, independent in the limit. 

In fact, r and p are independent for normal variation and hence 


Ep”) = E(r") E(p") = Elg”) 


К") = at (48.76) 
We can then evaluate the moments of r from (48.75), finding 
a(r) = - 17 
_ m(n-3) 
A = erie (48.77) 
2(2n — 1) (n— 6) 
us(r) = (CESVICESYICESON (48.78) 


The mean agrees with the exact non-circular result at (48.25), but the variance is only 
the same to order n~? as the non-circular variance (48.40). 


48.12 Dixon (1944) obtained an approximate form for Anderson’s distribution by 
an ingenious smoothing of the characteristic function (48.68-70). Write temporarily 


a=1-2it, B = —2is, 0, = 2nk/n. (48.79) 
We then have approximately 


g(s, t) ос Tarp cos 0,)-# 
(ne B) exp [-i log E («+8 cos o] 
+ (af) exp [-a En («+8 cos) 
= (0+) exp ЕЗ In log (2+8 cos 6) а 


= (a+)! exp [—3n log (3(-- (2*—57)))] 
= Zina + В) {a+ (622) in. (48.80) 
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By successive differentiation we can now obtain the moments of д and р. In fact, by 
differentiating for х and integrating for Ё we can find moments of the form E(g*/p"). 
We find the unexpectedly simple forms 


a - (48.81) 
"XC 
= nii (48.82) 
TE 5 (48.83) 
Hs = 7 (n—1)(n43) : 
3 (48.84) 


A 7 arar 


It may be shown (Dixon, 1944) that these moments are, in fact, exact. ‘The moments 
of r? may be seen to be, up to the (4n)th, those of the distribution 


Tni) 1 —pre-2 

Ts) r7!(1—r?) . (48.85) 
Thus, from (16.62), the squared serial correlation has the same distribution, approxim- 
ately, as the squared ordinary correlation coefficient in samples of 1-2 from an un- 
correlated normal population. 

Dixon (1944) also treats the case when the series has known (zero) mean and the 
case of a coefficient of lag / — 1. The same result holds, with n+1 replacing n when 
the mean is known, up to (”/2m) moments, where m is the largest common factor of 
land m. Cf. Exercise 48.20. 


48.13 Koopmans (1942) reached the same result by a different route. He expressed 
the c.f. of p and д as a contour integral and smoothed the values of 1, as above, by spread- 
ing them uniformly round a circle before integrating. This led him to the expression 


—1)2in [arecosr 
he) = go [| (cos a—r)#"-# sin nx sin а dx. (48.86) 
° 
Rubin (1945) evaluated the integral by showing, with the aid of a partial integration, 
that 


2, п) = —nrh(r, n—2) 
and proceeding by induction. 


48.14 We had better pause at this point and review progress. Our large-sample 
results are explicit, but troublesome to apply; and in any case, large samples in time- 
series analysis are rare outside the domain of physics and meteorology. For small 
samples we have obtained exact results for a random series; but again, we should not, 
as a rule, wish to apply tests to a series until we were satisfied by simpler tests that it 
was not random. Moreover, our exact results depend on the assumption of normality 
and, for the most part, apply to circularly defined coefficients. Nevertheless they are 
very illuminating and provide a number of interesting further problems. 
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The Madow-Leipnik distribution 
48.15 If the statistics £j, tə . . . , & are sufficient for 0, Oz, . . . , Oy we have, in an 
obvious notation, 
Р(и„...›и„]6) = O(t | 0)R() 
and hence 
6) 
рө) = P(t| o) LEl 48.87 
(t| 6) = (199 9010): (48.87) 
Madow (1945) used this result to derive the approximate distribution of the serial 
correlations for the non-null case from those for the null case 0 — 6,. 


Suppose that 1, Us . . . , и, have a joint normal distribution, with mean p, of the 
following type: 


log L = constant—} [4 È (uj—p)+2B X «6—0. (48.88) 
i=l i=l 


As before, we assume a circular definition of r, with q and p as its numerator and 
denominator. Then à, p and q are sufficient for и, A and В. We take as null the 
Anderson case with А = 1, B = 0 and hence find, from (48.87), 


Bien ee) роо 


er 

and since r and p are independent, with p a multiple of a 7?(n—1) variable this is 
ос pin- 9 e-in4+2En f (y), (48.89) 

We integrate out from p = 0 to œ to get 


P(r | А,В) « (48.90) 


1 
ваз Brje / О) 


'Thus the non-null distribution differs from the null distribution in having the factor 
(44+ Br)" adjoined to it. If it is known that the u’s have zero mean, (48.90) is 
modified so as to have {л as the exponent in the denominator. 


48.16 In particular, let и obey the Markoff relation 


= ply ate 
We know (cf. (47.70)) that 
var u = гі g Var & 
and 
cov (щы) = p Var u = ifa" е. 


Then the distribution of the ғ'ѕ has likelihood given by 


1 
log L = constant 7; Ze 


1 
= constant— 75 E (u;—pu;-1)? 
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where o? = vare. Approximately this is equivalent to 


log L = const (1+ pt) ® {+ ВЕ Mp (48.91) 
Hence, in (48.88) we have 
lp Р 
“ж а? ’ З= от 
and thus 4A+Br  (1—2pr+p?). (48.92) 


Thus, from (48.90) with the modified 4n in the denominator, and the remaining factor 
replaced by the approximate form (48.85), we have 
Tü(r-1) (=r) 
Pi eeu EZ, , —1<г<1. 8.9. 
CIA = терт) Qe 147 d 


48.17 This remarkable form has been studied by Leipnik (1947), Quenouille (1948), 
Jenkins (1956) and Kendall (1957). Its moments are not nearly so easy to obtain by 
straightforward integration as might be expected. 

For the moment of order k about the origin, writing C for a constant, we have 

‚_с[1 (er 
= E (12 p*— 2pr)^ 


1 prr*-i 


rom f 1 -f (1+p%)— (1+ p*—2pr) gp 
п др -1 (1+p?—2pr) -1 — 2p(1+p?—2pr) 


= 19-02) апар Trot тағ 1, 
2 


“1+ p*—2pr 2p. | Ira 2p" 
als i ee Gane 
=z ghab 1++р*—2рг Р, 
whence we find 
Lo. IN 1+ 3 ,1YV , 
а E nat е d В .9. 
z+)“ E 4*2) с 


It will then be evident by induction that ду, is a polynomial of order k in p. Moreover, 


even-order moments contain only even powers of p, and odd-order contain only odd- 
order powers of p. 


Let 
k 
HE = Ў аыр". (48.95) 
m=0 
Differentiating (48.94) m times and putting p = 0 we find 
1 
Ang (Ds nem Das us. (48.96) 
We then find, since иу = 1, ao = 1, 
а = PEE 
giving щ = zia (48.97) 
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Successive applications of (48.96) then yield 
1 m(n-l) g 


Ha = 53" (а) (аР 8.98) 
, 3np n(n+1) * 
а = (532)(34) (n4)(n-6) pe 
: 3 6n(n-- 1) p n(n+1)(n+3) p 
A 7 (12) 09 Q2) 090-6" aAa as 05009 
and so on. In particular, for the moments about the mean 
1 n(n—2) 1—р? 
=з a+ ath n (48.101) 
= бир 2n(n—2)(3n—2) „‚_ —6p(1—p*) 
Hs = Ged)? (n4) * (n--2)9(n--4) (л +6) n? (8.103 
eem, (48.103) 


We note in particular that, in standard measure, из tends to zero and и, tends to 3, 
illustrating the tendency of the distribution to normality. 

'The distribution, it must be remembered, refers to the case when the mean of u 
is known (and can therefore be assumed to be zero). 


White (1957) and Leipnik (1958) have derived expressions for the moments in terms 
of polynomials of the Gegenbauer type; Leipnik derives an expression for the char- 
acteristic function and shows that it tends to the normal form. 


48.18 The approximate form of the variance given in (48.101) 


—p? 
var = 1 Р 
п 


(which, we note in passing, is not the same form as for a product-moment coefficient) 
suggests the normalizing transformation 

r=sinz, p=siné. 
This was tried by Jenkins (1954a), who found ‘a х=з—{ 


si = "HU pn 100-3) (48.104) 

me) = 5 s XI X 1,0075, (48.105) 
= 

ea = appt +O), (48.106) 

= 2p- <= =I). n-14 O(n-). (48.107) 


For moderate values of p this may be oat IA but evidently breaks down near p = 1, 
Cf. Exercise 48.17. 
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Daniels’ approximations 

48.19 A major advance in the derivation of sampling distributions of serial cor- 
relations was made by Daniels (1956), who had introduced saddlepoint methods into 
statistics systematically for the first time. For a detailed account of those methods we 
refer to Jeffreys and Jeffreys’ book Methods of Mathematical Physics (Cambridge, 
1956). Briefly, they amount to this: in the complex plane the integral of an analytic 
function around a contour which contains no singularities is zero; thus one path of 
integration can be deformed into another, provided that no singularity is crossed. The 
method looks for a path which runs through a saddlepoint of the surface, the presump- 
tion being that there the function falls away most steeply from its maximum, and hence 
that, in the neighbourhood of the saddlepoint, the values of the function being inte- 
grated are most highly concentrated. The path then gives us the steepest descent 
from a maximum value, and an expansion round the point will give us a good approxima- 
tion to the integral required. 

In applications the method simplifies where means or ratios of mean quantities are 
concerned, 


48.20 As before, let r = q/p. Let M(0,,0,) be the moment-generating function 
of p and q, i.e. 


M(0,,0,) = f ew+oadF, (48.108) 
In terms of the Fourier inversion we have 
1 
ЎФФ = олу If M(0,, 0,) е-®*-ва dO, dB, (48.109) 


the integration being along the imaginary axes of 0,,0,. In particular, 


1 

П) = Gans | | M0. tae dd, 

where the integration of 0, = 0,+716, is taken over the imaginary axis in the 0, plane, 
or any deformation of it which is permissible. Inverting with respect to бу, we have 


ж 1 
Готе = у. | мою, оз а, (48.110) 
so that, when differentiation is permissible, 
Е 1 (2M(0,—r0,,0 
[етее = > | PMO 7000. (48.111) 


and, since the expression on the left with 0, — 0 is the frequency distribution of r, we have 


= 1 ( 9M(0,—16,,6:) 
Wr) = 5 We yao Om (48.112) 


This is equivalent to Geary’s result of (11.78), Vol. 1, otherwise proved in Exercise 11.24. 
Frequently we wish to transform g, and hence б„, to some other form. If the 
transformation is such that 0, = 9.(0,,03), then (48.112) becomes 


Mr) = zÍ a Гоо, -ю, 6,)} AE а, (48.113) 
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48.21 Consider first of all a coefficient circularly defined, with known (zero) mean 
and unit variance, from a circular Markoff process. The joint distribution of the 
ws is then (cf. (48.91)) 


ДЕ eu zu [- (us p) ui 2p È u; us} о T NOR 
with, of course, ш = tp 
The m.g.f. (compare (48.69)) is 


M(u6s) = (1—p")A-4 (48.115) 
where 
1+р2—20, —(р+%) 0 go AREY, || 
= ~(6+64) 1+p?—20, —(p+0,)... 0 (48.116) 
+в) 0 0 Ves i 


'This is a circulant which reduces to 


LIS 
А = (Lept- 20 - 202) П {1 4p? — 20, —2(p+0,) cos ZR, 
=1 
ra а (48.117) 


Then A reduces to 
4 = eth g (2-2 cos 7H 4.1) 
n 


хуа Lei 


тетке сс (48.118) 
Непсе 
уу ЕУ 
M(0,—10,0,) = =i (1—2рг - p? 28e" (48.119) 
Log P -2pr p? 26.) 
where 0,— —p+ dea (48.120) 
90; (1—p") (1-2?) (1- 2rz 4 22)in-* 
Then (3), - (1—2") (1—2pr + p? 20,)"- I^ (48.121) 
and (48.113) becomes 
(n-2)0—p" ((1—2?)(1—2rz- 22)? 
h(n- 2ai(1— 2pr +p?) t= — dz. (48.122) 


There remains to determine the path of integration. Consider the pair of trans- 
formations which together compose (48.117): 


T _ 1—2pr+p? 
tyre tl inte es (48.123) 
The region | z | <1 will be seen to be mapped on the whole 0,-рІапе cut along the real 
axis exterior to the interval 

(+)? (1-р)? 


201+)'° 2(0—7) 
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Any path in the 0,-plane running from т —?со through the gap in the real axis to т +ico 
corresponds to a path in | z|<1 running from e~* to её, where r = соѕ ф. The 
path of integration for (48.122) is therefore of this form. 

Consider the factor (1—2rz--3?)"-* = {1—r?—(z—r)}!"-* in the integrand of 
(48.122). It has a saddlepoint (maximum) where 2 = r, a real point. Further, the 
path of steepest descent away from this point is perpendicular to the real axis (a result 
we quote without proof). Thus the path of integration required is the straight line 
joining e-59, еі, 

So far the results are exact. Let us now neglect the factor p" in (48.122) and 1 — 2^ 
in the denominator of the integrand in (48.122). We then have 

n—2 3 
Ir) ~ Seas | (1—2) (1—2rz + 2?)i"-2dz. 
Put z—r4iu(1—r*?. (48.124) 
Then —1<w<1 and we have 
ze) Lem ?) (1 — g2yin-2 
We) = d —2pr +p?) NT 
20 T(n41)(1-72)0-5 
= al (n+ 4) (1 2r ре) 
which is the Madow-Leipnik distribution (48.93). 


(48.125) 


48.22 If we аге not content to neglect the factor 1— 2" in (48.122) the integral 
cannot be evaluated in closed form. If, however, we expand it in powers of 2" we 
obtain on integration the series 

T (4n 1) (1— p") f((1—72)"-» 3 а 


HO 7 арр гона) TGr ан 
5 ат" 
+ 5, (1-9 R 
з/п Тү dris (48.126) 
2 r(F+5) 


Daniels (1956), to whom this result is due, has obtained an upper bound to the error 
involved in approximating to (48.122) by its first term. The error is, in fact, small 
near р = 0 but not near р = 0-5 for n of the order of 20. 


48.23 By the use of this method Daniels obtained a number of further results 
which we quote without the detailed derivation. 
In the Markoff case, for a circular process with unknown mean and a circular serial 
correlation coefficient, 
м (n-3)0-p) — | (1—з)(1—»)(1—2уа+%-® 
OE 2ai(1— p) (1 — 2pr +p?) 1-а c ишу 
Again ignoring the factor 2" we have 


nTn—3)(1 72) 1 
“оту d t de cg f -:-la +n}. (48.128) 
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It is to be noted that we can derive the moments of this distribution from those of the 
Madow-Leipnik distribution (48.93). 

An alternative form may be derived by the following considerations: the final term 
in (48.128) is relatively of order n-!. If we replace r by p in it, the result remains true 
to order n-1, We may therefore remove it, but the constants in (48.128) then need 
revaluation to ensure that the integral of h(r) over the variate range is unity. We 
arrive at 


w= RARBG ron. eum 


48.24 For the Markoff case and a non-circular process with known mean consider 
the non-circular statistic, defined by 
шиа... LA 
d ++ (48:430) 
Daniels finds 
"PERSE 
2zi T'(4n—2) (1— pr) (1 —2pr +p?" 
or an equivalent approximation 
M) = TQN41) (1-r9«-» 
ai T(3N +2) (1—2pr +p?) 


(12-O(n-1) (48.131) 


{1+ O(n-*/*)} (48.132) 
where 
p? 
N= Аун (48.133) 


48.25 For the non-circular Markoff process with unknown mean, using r defined 

by (48.130) and N by (48.133) we have 
S TON +3) (i=7) (Ue ae -3/2 
MO = затору -A-0 +p} (Zor pD * 00 19. (48134) 

48.26 Itis possible to take these results a good deal further. Daniels (1956) deals 
with the general autoregressive process circularly defined. 

In cases of higher order than the Markoff process it is also of some interest to con- 
sider the joint distribution of two or more serial correlations and of the partial correla- 
tions. Quenouille (1949b) was the first to do so. For some later work see Jenkins 
(1954b, 1956), Watson (1956) and Daniels (1956). 'То save space we shall not enter 
into a detailed discussion of the results. Except in the Markoff case it appears that 
only circularly defined statistics and processes are reasonably tractable. Daniels’ 
method can be extended to the non-circular case, but apparently nobody has yet had 
the stamina to embark on the labour involved. 


48.27 Serial correlation distributions in the non-normal case have received very little 
attention. Sampling experiments by D. R. Cox (1966) and by Quenouille (1948) suggest 
that in the null case normal theory remains approximately valid for a wide range of distri- 
butions, even for n = 10 or 20. However, very long-tailed distributions have smaller 
variance forr,. Cf. 48.6 above and Moran (1967) who investigates the negative exponential 
parent in detail. 
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EXERCISES 
48.1 Verify equation (48.12). 


48.2 In a Yule scheme given by 
ш—1-1ш-у+0-5ш-—;, = = 
show that approximately 
var r, = 244/n. 
(Bartlett, 1946) 
48.3 For a series in which p, = 0, s>4, show that 


1 3 
Er) = m (2p 2pi*2p), j>3. 


48.4 А sample correlation coefficient is defined circularly as 


a 
E uui- nü? 
1 


n= 


Show that in a Markoff scheme 


(Kendall, 1954) 


48.5 For the circular definition of the previous exercise show that, in samples from a normal 


random series, 
var (rn) = - 09) ^ 
(Qi 1)n— 1)* 


(Moran, 1947) 


48.6 In continuation of 48.6 show that 
_ 4(n—4)(n—3) 
Hr) = (y (4.3) 
and that, for the circular definition, 
_ 5 20n-1)n-6) 
Ва 7 („туз (nt 1)(n+3)" 
(Moran, 1947) 


48.7 Evaluate (48.9) for the Markoff scheme and hence reconcile it with (48.101). 


48.8 For a large sample from a Markoff process, in the manner of 48,1, show that, in 
respect of rj, 


6432) p?) 
var о = n —p3 
— ptf 
ment | tU Pus {ais sata] 
TAE vertes] 
cov (c, v) = a [2+ EE 
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and hence that, independently of x4, 
(1+ 1-р? 
м1 [0 nem =] 
(Bartlett, 1946) 


48.9 Analogously to (48.9), show that for large samples, 


1 
cov (rj, pu. X {рр ри рак + Api psx ри? — 2p] Pi pis jtk 2p je P Pi) 
E 
(Bartlett, 1946) 


48.10 Show (cf. Example 48.4) that the large-sample variance of rj at (48.9) is given by 
n var ту = [(1-- 253) co. 29+ со. 227 —4p;co. gi in G*(z) 
where G(z) is the autocorrelation generating function and “ со” means “ the coefficient of ” 
Verify on the result of Exercise 48.8. 


48.11 Continuing Exercise 48.4, show that 


Bir) = #- 1 {tea p)3jp — =) 
(Kendall, 1954) 


48.12 Defining the first serial correlation (known zero mean) as 
nol 


i=l 
show that, for a random series, ғ; and the denominator are independent and hence derive 
nn) = 0 = unm) 
n 
Hie) = Gia) 
3n? (n! -4n —9) 


Halt) = Gp 2) (0-4) (1-6) 
(Moran, 1948) 


48.13 Starting from the Madow-Leipnik distribution of (48.93), make the transformation 
r=tanhs, p—tanht, z-i- x, 
and by expansion show that the distribution of x is given by 


1 xt 
ВО) = сун) "Р (5) 
where а? = cosh?¢/n = 1/{n(1—p*)} 


and 
E 4 А12) mi -59-3p) , рл? 
ш 2 + 12 ке} 
1 5x*(1—p?) , nxt (1—p?)(1—3p*) рх 2 
fos 12 A о, 3). 
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Obtain the moments and show that z is distributed approximately normally about mean 


р Р(1 + р?) 
Ср sf phy 
with variance 
1 2p? 


пр) nX1—p)* 
(Quenouille, 1948) 


48.14 Taking the Yule scheme of Exercise 48.2, show that a generating function for the 
autocorrelations is given by 


ot X pat = — —— RE = 
S а +52") (1 —1-12-1+0-52-*) 
m 14 07333052 „| 07333-0523 , 
1-1412--0:523^ 151-2714 0-527? 
where o? = var и, Squaring and expanding, show that 
Eg = 244, 
-® 
and hence confirm that approximately, for large samples, 
var r; = 2-44/п. 


(Quenouille, 19472) 


48.15 For the general linear autoregressive scheme X pedis = e show that 


(Murteira, 1951) 


48.16 For the scheme of the previous exercise, in which ғ; is replaced by Ў Biei-i, show 
1-21 


that the same limit takes the value 


(Murteira, 1951) 


48.17 In the Madow-Leipnik distribution (48.93), put 
r=sinz, p=sint, л—{=х, 
and show that 
h(x) = 
п 11 (2+13p%) ДАРА dop(idSpt) а. 
JE exp (— mafi- X- or ыт 94 12р тд + 8 ESTO в ут" 


и 1 p(2+13p?) 1 
Ес)! 


Hence derive equations (48.104) and (48.105). 


(Jenkins, 1954a) 
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48.18 In samples from a random series with zero mean and unit variance, show that the 
characteristic function of 
p= 1510, q = шш, r-üZuuns 
circularly defined, is the circulant 
n 4 
II (1-0-2 о, xt!) 5 
п 


k=1 


Hio = Hr = 0, Изо = Hos = 
Hu = Ha = Hn = 0 
2 
Pa 7 (1+2) (n+4) 
n+12 
Ва = a+) (n+4) n6) 


Deduce that if 


zE 
n+2 


(Jenkins, 1954b) 


48.19 Following the previous exercise, if statistics are defined with mean й, e.g. 
p = 12(ш-0), 
show that the characteristic function of p, q, 7 is now 
n-1 Е" 
П (1-0-0 em, aat) 
1 n n 


k= 


and hence that 


1 


= > 
(Jenkins, 1954b, who gives values for higher moments.) 
48.20 For the statistic 7; defined as in (48.42) but with known mean, and therefore with the 


omission of й, show that the c.f., corresponding to (48.80), is the same with the omission of the 
factor in («+8)'. Hence show that odd-order moments vanish and approximately 
_ 13.5... Qk-1) 
Hak = (rE2)(n-4)... Qr 2B* 
Hence verify (48.85) with (n--1) replacing m. 


(Dixon, 1944) 
48.21 Xi Xs, ... Xon+ı are independent random variables, and r is defined as the serial 
correlation coefficient between xs and $(Xs-1+ Xi) s = 1,2,...,7. Show that E(r) = 0, 


var r = (n—1)7*. 
(Moran, 1967; cf. 31.19, Vol. 2, for the ordinary correlation coefficient.) 


СНАРТЕК 49 
SPECTRUM THEORY 


Harmonic analysis 

49.1 In Chapter 47 we have encountered the spectrum and associated functions 
as transforms of the autocorrelation function, but pointed out that they arose naturally 
from a different viewpoint, namely as measures of the closeness of the correlation 
between a time-series and certain harmonic terms. We proceed to develop this 
approach more fully. 

Fourier was led by his studies of heat-flow to consider the expansion of functions 
in series of harmonic terms of the type 


f(x) = X a,sinrx-+4b,+ Èb, cos rx. (49.1) 
r=1 т=1 


Notwithstanding the cyclical character of the individual terms, a very wide class of non- 
cyclical functions can be represented in this way over a limited range. It is, for example, 
sufficient that, in the range —z to +2, f(x) be single-valued, continuous except for а 
finite number of discontinuities, and have only a finite number of maxima or minima, 
for such an expansion to be valid. The series on the right in (49.1) is called a Fourier 
series. It has the attractive property that successive terms are orthogonal. For 


ad - =0, r¥s, 
i , > 
|\ соз rx sin sx dx ENT 


Г sin rx sin sx ШЕ i Я Er А ae 
Hence, on multiplying (49.1) by sin rx and by cos rx and integrating, we find 
a E NC uc (49.3) 
=l NO cos rx dx. (494) 
'The series may also be written in the form 
f(a) = 3 o sin (red) (49.5) 


where ¢, is a phase angle. 

Since all the terms in (49.1), apart from the constant, are of period 2л, the expression 
for f(x) has that period. If f(x) is defined over an interval – L to L we may expand 
it in terms of sin (~rx/L) and cos(arx/L). This, of course, is merely a matter of 
re-scaling the interval from one of length 2z to one of length 2L. 


49.2 Angles, measured as usual in radians, have zero dimensions, Thus the 
quantity аё in sin a? has zero dimension and « is accordingly in radians per time-unit 
454 
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It is sometimes called angular frequency. Where no ambiguity is involved we shall 
simply call it the frequency. However, sin at repeats itself with period 27/0 and there- 
fore the number of cycles per time-unit is о/2л, which may also be regarded as the 
frequency. The period 2z/x is of dimension t and is also called the “ wavelength,” 
although, in our context, “length” is a period of time. 


49.3 It appears, then, that a function may be expanded in a series of sines and 
cosines, the successive terms іп (49.1) having periods 2л, 27/2, 27/3, etc., and the 
corresponding angular frequencies being 1, 2, 3, with cycle frequencies 1/27, 2/27, 3/27. 
More generally, when f(x) is defined over the interval 2L, the angular frequencies are 
typified by zz/L. Thus there is one fundamental frequency z/L and the others are 
integral multiples of it. Such а representation would be rather artificial if we knew that 
f(x) was the sum of harmonic components with incommensurable frequencies. We 
are thus led to consider the more general harmonic series 


HOS РО sin (aja) +3 b, cos (ау x), (49.6) 


where the «’s can have апу real values. There is now no simple way of evaluating 
a, and b, such as is given by (49.3) and (49.4). The problem of estimating them was 
considered in the nineteenth century by physicists and meteorologists, and although a 
great deal of knowledge has now been accumulated, the methods in essence are the 
same as those used by earlier authors. However, there has been a change of outlook. 
Former authors were looking for concealed harmonics. ‘The more modern approach 
is to regard the spectrum as a characteristic of the time-series whether it is truly a sum 
of harmonics or not. 


Nyquist frequency and aliases 

49.4 For series observed at equal unit intervals of time there are two important 
features of harmonic analysis to observe. It is clearly possible for periodicities of less 
than one unit to escape notice—for example, if we observe a series every January Ist, 
seasonal movements will not be revealed. We need at least two observations in the 
year to detect periodicities of one year. Generally, for a time-interval tọ between 
observations we cannot measure periods smaller than 24%, or angular frequencies higher 
than z/t,. This limiting value is known as the Nyquist frequency. 

In the spectral density function defined in 47.10 as 


ща) = È yel, (49.7) 


our time-interval was unity and the range of а is from 0 to л. The ordinate at л 
represents the value of the spectral density at the Nyquist frequency. 


49.5 The second effect to remark is also related to the interval of observation. 
Suppose that the interval is unity, and consider the term sin (22/3) for t = 1, 2, 3, etc. 
Its values are 4/3/2, — 4/3/2, 0, 4/3/2, etc. But these are also the values which would 
be observed for sin (82/3) or sin (1422/3), etc. The width of the interval of observa- 


tion does not permit of a distinction between angular frequency 27/3 or any of the 
oo 
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Fig. 49.1—Power spectrum of the Beveridge wheat-price index series (Table 47.1) 
For clarity, only frequencies up to 0-24 are included, the remaining part of the spectrum 


The curve is a smoothed spectrum using a Parzen kernel 
(Exercise 49.7). See also Fig. 49.4. 
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being negligibly small. 
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Frequency (cycles per year) 

Fig. 49.2—Power spectrum of the data of Table 47.2 (marriage rates) 
Maximum frequency at 0:1346 corresponding to a period of 7-4 years. The dotted line 
is a smoothed spectrum using a Parzen келу: (Exercise 49.7) and the first fifteen serial 

correlations, 
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angular frequencies 27/3 +2л], j = 1, 2, 3, etc. These higher frequencies are known 
as aliases. So far as observation goes they are all equally consonant with the data. 

49.6 In 47.11 we defined рен. 


ee ves de 

b i u,sin at. 49.9 

© = edt (49.9) 

We showed that the intensity I(«), defined as us sum of squares of а(х) and b(x), was 
equal in the limit to the spectral density function w(x) multiplied by о?/л. We graphed 


w(x) for a Markoff and a Yule scheme in Fig. 47.4 and 47.5. A few practical examples 
are given in Fig. 49.1 to 49.3 for comparison with the correlograms of Exercises 47.20 


i 1, COS at, (49.8) 


ectral density wla) 
ЕЗ g 


Sp 


"0625 1250 1875 2500 -3125 3750 4375 5000 
Frequency (cycles per unit time) 


Fig. 49.3—Power spectrum of the data q жеш 47.4 (second-order autoregressive 
scheme) 
The dotted line is a smoothed spectrum using a Parzen window (Exercise 49.7) and the 
first fifteen covariances. 

to 47.22 (namely the Beveridge wheat series, the marriage-rate data of Table 47.2, and 
the artificial Yule scheme of Table 47.4). 

Observational material often presents these wild fluctuations and we shall see 
presently why this is so. 


49.7 It is sometimes convenient to take as ordinate the logarithm of w(«) rather 
than (x) itself. This avoids over-emphasis of the larger intensities and also has the 
advantage, as we shall see later, that the error bands in certain classes of estimation 


458 THE ADVANCED THEORY OF STATISTICS 


are of constant width (cf. 49.15). For the most part we use w(x), but some examples 
in the next chapter are based on log (ж). 

The sums а(х) and (о), being weighted sums of variables with the same variance, 
will be close to normality for stationary series. We will first consider the behaviour 
of the spectrum when harmonic or trend terms are present. 


49.8 Suppose that the series и, consists of a harmonic term with angular frequency 
æ added to other terms which are not correlated with it: 
u, = csinat+g. (49.10) 
We calculate 
n 
У sin at e? 
t=1 


and find that it is equal to 
ais gay [e n9 9) — eos (60) -isin (n+) (A) "sin 91 
+ a similar term with + in place of —f. (49.11) 
In the neighbourhood of В = « this is dominated by the first term. The sum 


У ge is, by hypothesis, of negligible size. Hence the intensity J(«) is the sum of 
squares of real and imaginary parts in (49.11), multiplied by c*/nz. We find 
= c sint (na 0) 
K6) = dan sin? {Це (49.12) 


The corresponding periodogram ordinate is, from (47.30), 
с? зіп? (1n(x— 8) 


546) = a sin? CET (49.13) 
Now suppose that 
M =, ibi л (49.14) 
We have then, to a close approximation, 
5(8) = с? sin? тл (49.15) 


(тл): C 
Hence at f = « the periodogram will have а peak of amplitude c? and this will be 
flanked on either side by lesser peaks of diminishing intensity at distance 3, 2, 5, etc., 
from it. 

A similar effect appears in the power spectrum, except that at f = « the ordinate 
becomes infinite in theory and may be very large in practice. The reason for choosing 
the divisor 4/(nz) in (49.8) and (49.9) rests on the fact that we wish the ordinate to 
give the value of the power spectrum or an estimate of it. If и is a purely random 
series, in the limit 

Цо) = o*/z, (49.16) 
so that the ordinate of the spectrum is the same for all frequencies. In the periodogram 
it would, theoretically, be zero. 
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49.9 Whether the “ side-bands” given by (49.15) show up either in spectrum or 
periodogram depends to some extent on the intervals of frequency at which I or 5 
are computed. If the periodogram is plotted with period as abscissa (as, in our 
definition, it is) the side-bands become wider for increasing period. In fact, let 


2л 2л 
“койыу: (49.17) 
Then from (49.14) 
1 19 se 
Anu n 
or approximately 
2 
aft Е (49.18) 


so that the width of the side-band peaks depends on 4. 

Fig. 49.4 gives the periodogram of the Beveridge series for comparison with Fig. 49.1. 
The values were calculated by Beveridge, at first on a grid of wavelengths of fairly 
equal width, but supplemented by additional values where peaks seemed to be indicated. 


80: 


y 
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$ 
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Fig. 49.4—Periodogram of the Beveridge wheat-price index series, for comparison 
with the power spectrum of Fig. 49.1 


Example 49.1 
It sometimes avoids tedious summation, and makes the essential point for asymptotic 
results, if we replace sums by integrals. For example, with л large, 
È sin at sin pt 
t=1 
may be replaced by 


- J 
| sin at sin ft dt, 
0° 
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where T is the length of the series. The integral is seen to be 
l[sin((x—5)T) sin («+A)T 
2 a—p +в Ў 
Likewise, to the same degree of approximation 
È sin at cos pt dt = 5 [== = eseten, 
t=1 


a—B «4B 
The intensity near ж = В is then given by 
Ke) = = 05 (49.19) 


and the limiting case when «— f tends to zero may be discussed as before. 


Non-harmonic periodicities 

49.10 It must be remembered that a peak in the spectrum, interpreted as a har- 
monic, is only unrelated to other peaks if they all relate to pure sine or cosine terms. 
If there is present a periodic term which is not a simple harmonic there may be several 
peaks in the spectrum corresponding to it. 

Consider a somewhat extreme case in which the periodicity is of the type shown 
in Fig. 49.5. 


in 


т 2n 3m 
Fig. 49.5 (see text) 


This is, in fact, the graph of 4x in the range 0<x<z, continually repeated. Now 
we have the Fourier expansion 

ix = sinx—}sin2x+}sin3x—..., 0<х<л. (49.20) 
Thus, in the spectrum there will be peak intensities at frequencies 1, 2, 3, etc. In the 
periodogram the intensities would form a series with diminishing amplitudes propor- 
tional to 1, }, $, etc. For non-harmonic periodic elements, therefore, there is always 
the possibility of the fundamental frequency being echoed along the spectrum. 


Example 49.2 

Let us consider what happens if the series has a linear trend in it. In fact, let us 
take a pure trend и, = t and apply spectrum analysis to it. Approximately, as in 
Example 49.1, we have 


- ж T т 
Í t sin atdt = [©] +f зед 


0 « о [] 
= - See. (49.21) 
a a 
» i - 
Likewise, f озшш e ruo (49.22) 
о a 


SPECTRUM THEORY 461 
Thus the intensity is given by 
te fq 
де = (eom) 
vp 
= 7 +0(1). (49.23) 


The power spectrum (T constant) would therefore be a curve of type у = 1/x*, with 
large intensity at the origin. The periodogram, on the other hand, would have 
2 
8%) = а, = А, where 2 is the wavelength. (49.24) 
The results are understandable in general terms. A trend is like a long wave which 
is equivalent to a low frequency. Evidently, if low frequencies are of interest, every 
endeavour must be made to remove trend from a series before spectrum methods are 
applied. 


Test for the spectral ordinate 

49.11 Harmonic terms іп a series may be likened to point-densities in a probability 
distribution; in the spectrum they define lines, not continuous densities, although, of 
course, in practice these lines are blurred for finite series. We proceed to consider 
the behaviour of the spectrum for stationary series of the non-deterministic type, which 
can be represented as the weighted sum (finite or infinite) of a series of random variables. 

Consider first of all the sums a(«) and b(a) of (49.8) and (49.9) when и, is a random 
series with zero autocorrelations and variance g?. Since, for large n and х # 0, л, 


List сз cy апаар нр, (49.25) 
nk-l k=l 
1 2 cosak È sinak >0, (49.26) 
Nk=1 k=l 


we see that a, b are independent N(0, с°/(2л)) variables. Hence 2a1/o* = 2n(a*+5*)/o* 
is distributed as y? with two degrees of freedom. Equivalently, the sum 5° in the 
periodogram is distributed as 


dF = exp (-55) 45°, (49.27) 

It follows that for the spectral ordinate, asymptotically, 
E=”, Еш) =1, (49.28) 
var I = Ee var w = 1 = {E(w)}*. (49.29) 


When « = 0 or л, the variance is doubled. 
Thus for a random series the standard error of the spectral ordinate is of the same 
order of magnitude as the ordinate itself. 

The distribution (49.27) has been used to provide a test for ordinates in the periodo- 
gram. The probability that 52 exceeds some value 4o*x/n is e~*. In 1914, G. Walker 
pointed out that if e~* is small, the probability that m independent ordinates should 
not exceed 46? x/n is (1—e~*)™, so the chance that one at least exceeds that amount is 

1--e77". 
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Davis (1941) tabulated this function. Fisher (1929), remarking that the test depended 
on c*, effectively Studentized it. Davis also tabulated the function which emerges from 
the analysis. 


49.12 It is very remarkable that the result of equation (49.29) is, for large samples, 
generally true of stationary series of the non-deterministic type, a result which is essen- 
tially due to Bartlett (see, e.g., his book of 1955). It will be convenient to summarize 
(49.8) and (49.9) in a single formula 

J(z) = a(a)+ib(a) = ia) ae ud, (49.30) 
The expectation of J(«) is zero. If и, is a random series with zero autocorrelations 
and variance o*, we have 


EUG) = Z È enim 


=” ee ü- m) 
пл — gto ái 
If a, В are of the form 2лр/п, p integral, this Кузе Similarly it follows that /(ж) 
is uncorrelated with the complementary J*(«) = а(х) – (о). Thus a(x) and Мо) are 
uncorrelated in this case and the corresponding spectral ordinates are uncorrelated. 

We have Ј(о) = J(«)J*(«), and putting f = —« in (49.31) we confirm the result 
that 


(49.31) 


E{I(a)} = — 
We have further 


Eta) ЦВ) = (& ще Ў use ei Ў aen me} 
tel a=1 1=1 
1 
xr 1: 2 E(uu, uy uj) exp {(о4—о5 + Bk— ВІ). (49.32) 
The expectations vanish unless t = s = А = l (giving the fourth-order moment of 


u) or the suffixes are equal in pairs. If i: = s and А = / the corresponding term is 
E(I(x)) E(I(f)). Hence we find 


Ke, of f(1—cos (ns) „1 Ín(x.— В)) 
«тшй eR TOC за TIU e i) 


If « = В we find, since d ud — 40? for 0 small, 
с^ " 
var I(x) = a +O(n-), (49.34) 


confirming (49.29). 
If и is non-normal, the covariance of I(«) and I(f) is of order 1/n. If и is normal 
it is of order 1/n? and further is zero if о, f are of the form 2xp/n, p integral. 


49.13 Consider now the case when и, is a weighted average of random variables e, 
say 


= 
t = Хве (49.35) 
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We have 
1 on 
=— У 2 
Jule) = TE ust 
= Бева Ў Ў 0—8) pias 
(vm) 5 A Саа 
= es = fk al E” р. approximately 
= Ј,(ә) h(x) (49.36) 
where h(a) is the transform of g, namely 
h(a) = È ge. (49.37) 
a=0 
We have at once 
1(«) = 1L (2) h(a) h* (2), (49.38) 


which is another form of the result obtained for the effect of a transfer function in 
47.24 in the context of a continuous series. Further we obtain 

E(L,()) = Шо) h* (o) EU. (а) (49.39) 
and asymptotically, 

var І, (о) = [E(I, (3))], (49.40) 
which, as at (49.29), is doubled at « = 0 огл. 


Smoothing the spectrum 

49.14 These results provide us with a novel problem in estimation. The observed 
ordinate in the spectrum for a series of length n does not have a variance of order 1/n, 
but of order w?. Furthermore, the ordinates for values of а equal to 2zp/n are un- 
correlated (exactly for normal variation and approximately otherwise). The observed 
spectrum will thus fluctuate violently—Fig. 49.1 is a good example—and is a most 
unreliable estimator of the parent spectrum. 

We shall attempt to overcome this difficulty by smoothing the spectrum, replacing 
10а) by a weighted sum of neighbouring ordinates. This will render the estimator 
well-behaved in the sense of having a small variance, but to obtain such a result we 
have to pay a price in the form of bias in the estimator itself. 


49.15 Let us take a weighting function A(u) obeying the conditions 
Щи) = h(u+2z), (49.41) 
ls Ku) du = 1. (49.42) 


This function is variously known as а “ kernel ” or “ spectral window." If I(«) 
is the estimated intensity we construct the smoothed function 


L(a) = | Mu) Цаа) du 


= ( а-и) ци) du. (49.43) 


If (а) is unbiassed we see, on taking expectations, that І, (х) will, in general, be 


464 THE ADVANCED THEORY OF STATISTICS 


biassed, being a weighted average. To reduce the bias we desire А(и) to be concen- 
trated in a narrow range in the neighbourhood of u = 0, in which case the integral on 
the right in (49.43) will give an approximately unbiassed result. Unless h(u) is, trivially, 
the unit function at и = 0 there will, however, be some loss of resolution. A highly 
concentrated h(u) may be thought of as possessing an effective range which is much 
narrower than the full range of definition —z to +z, and this effective range is some- 
times known as the “ bandwidth” of the spectral window. 

We may approximate to the integral (49.43) by a sum 


2л 5 
I, (a) = Ет У җи) Қи), ш = 2aj/n. (49.44) 
Since the values of I аге independent we then have 
var I, (a) = = E h? (uj) var (а-и) 


and using (49.40), this is approximately 


4л? 
= a У h? (и) (а-и) 
= = ^ (ш) I (e-u) du. 49.45) 


If h(u) is concentrated in a narrow bandwidth this will give us, approximately, 
var Г, (а) = Hna 7. hu) du, (49.46) 


again doubled at « = 0, л. Thus, provided that the integral is bounded, the variance 
is now of the order of 1/n. It also follows that, to the same degree of approximation, 
var log Г, (о) is a constant, and hence that log Z4 has confidence intervals of constant 
width. 

It may also be shown that the correlation between Г.(о) and 74(8) is approximately 


Гм Җи+а–В)йи / f во du, (49.47) 


and as this is positive for any acceptable weighting function the use of the word “ smooth- 
ing” is justified. 


Calculation of spectra 


49.16 For the calculation of spectral ordinates in practice we do not work out 
the sums (49.8) and (49.9) for varying values of « and then compute the intensity or 
the spectral density. In fact, we have 


Ца) = = [Èu cos 2E (È u, sin 2 


n 
У u,u,(cos at cos as+sin at sin оз) 
TE 


; 
X се cos ka, (49.48) 
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where c, is a covariance-type expression defined by 
n-k 
Ge 2 [UM (49.49) 
For infinite n this reduces to the known expression (cf. (47.27)) 
2 20 
Що) = Tula) = T X p, cos ka. (49.50) 


Calculation of the spectrum usually proceeds from (49.48). In any case we cannot 
compute c; for Ё>п—1 and in practice would rarely wish to go as far as n—1 serial 
correlations. Let us then consider the estimator 


L) = E Жс cos Ret, (49.51) 
E 
based оп q serial correlations. (‘The 2° are constants to be chosen at convenience for 
the purpose of improving the estimator.) This is equivalent, in parental form, to 
2 
1.09) = р Jy py Cos ka 
л -4 
ог 
v, (a) = Ў Ay py cos ka. (49.52) 
But from (47.22) Ч 
рь = E Í w(u) cos kudu, 
0 
and hence on substitution in (49.52) we find 


(ж) = 1 Saf ow cos ku cos ka du 


a f ET (x. 1/2, cos Ku con ња. (49.53) 
-a = 
The use of (49.51) is then, asymptotically, equivalent to smoothing the spectrum by 
the weighting function 
Mf) = д. È л, cos Аб cos ha. (49.54) 
-— 
Provided that 4, — 1, this obeys conditions (49.41) and (49.42). 
We also have 
ha (8)dp = L È 22 соз? ke. (49.55) 
=a 2n -4 
Example 49.3 
Suppose, in the first instance, we take all 2’s equal to unity. Then 


МВ) = i È cos icon ken 


EE feos («+ #8) +008 (2A) 


_ 1 [sin ((q--3) («-- 8) , sin (0+4) (9—0) 
al sin (a+) ^ sin («—B) ] (49.56) 
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ith 
= a. yl sin {(2¢+1)a} 
_POB = у. [е EAL) 2] А (49.57) 
Example 49.4 (Bartlett, 1950) 
tts A= 1#. (49.58) 
We find 
= 1 [зіп {(«+В)} , sin? (a(x— 5) 
Rf) =— qsin® (Қа) | qsin® (Ка 8)} (49.59) 
"а =! f(a-1)24-1) 1  sin2qxcos« 
апа m^ (8)48 = zÍ CEDAT айке H (49.60) 


Example 49.5 (Daniell, 1946) 
_ sin kh 


Take A. = EU һ>0. (49.61) 


We have the known integrals 
ке ae фа, p> del 


0 x к 
= 4а, |p! =lalf бе) 
=0, ер} 
The weighting function is then given by 
E 4 cos Bk cos ak sin hk 
Np) = (1922 EE } 
which is approximated Ьу the integral 
1 [sin hx 
эт |, ae. es («+ Bla} + соз 6014s 
(49.63) 


1 
DES h>a—B>—h, 
= 0 elsewhere. 
Various other kernels have been suggested, notably by Blackman and Tukey (1958) 
and Parzen (1961). See Exercises 49.5-7 and a review by Jenkins (1961). 


Estimation of spectral densities 

49.17 The problem of estimating spectral densities has received a great deal of 
attention, and a complete account of the subject (which is itself by no means complete) 
would occupy more space than we can allot to it. We must content ourselves with a 
summary account of the principles. 

The object of the estimation is to provide good estimates of the ordinates along the 
length of the power spectrum. (This is not usually the ultimate object of the analysis, 
a point which is apt to be overlooked.) We have seen that the ideal may be unattainable 
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for various reasons. We therefore introduce the kernel or spectral window to smooth 
out the grosser irregularities in the observed spectrum. A “good” kernel will be 
relatively narrow in range, but no kernel can be perfect, and its values may be unduly 
influenced by casual peaks in the spectrum—there is then said to be “ leakage” round 
the edges of the “ spectral window " which we are using to scrutinize part of the 
spectrum. We are thus led to consider the effectiveness of different kernels in smooth- 
ing, and hence introducing reliability, as against averaging, and hence introducing bias. 
For some of the procedures possible in this context reference may be made to Whittle 
(1957), who considers a prior distribution of spectral ordinates, Blackman and Tukey 
(1958), who discuss the use of prior analysis of the data, and Parzen (1961), who con- 
siders, among other things, the prior determination of the rate of decay of the 
autocorrelations. 


49.18 Leakage causes trouble, especially in estimates of the low part of the 
spectrum, for the kernel, though itself small in the outlying parts of its range, may 
swamp the average when multiplied by a high value of the spectral ordinate. For 
this reason Blackman and Tukey (1958) introduced a process known as pre-whitening, 
the object of which is to filter the series so that the peaks are flattened out. For example, 
if the original spectrum has а peak at о and we can transform the original series so 
that this peak is flattened out, the estimate at some other point a, will no longer be 
distorted by æ, "This is obviously a rather dangerous procedure, but fortunately we 
can afterwards recolour the spectrum, in the terminology of Nerlove (1964). The 
basic idea rests on the result we proved in 47.24, that if v(ż) is a filtered series derived 
from u(t) by a linear filter, then 

хо, (а) = wy (a)l f(#)|* (49.64) 
where f(x) is the transfer function of the filter itself. Knowing the filter, we can 
always recover the spectrum of the original series from the estimated spectrum of the 
transformed series. The procedure has been examined by Hext (1964). It may well 
be desirable to use different procedures for different parts of the spectrum. 


49.19 Daniels (1962) develops some alternative approaches. He considers first 
of all a preliminary smoothing by a kernel chosen so as to obey a given criterion, e.g. so 
as to achieve the minimum tolerable resolution. Then he unsmooths the spectrum by 
setting up a routine which improves the resolution at the expense of the sampling 
variance until no further useful change is detectable in the fitted spectrum. Two 
unsmoothing processes are discussed, one approximating the spectrum locally by a 
polynomial, the other based on differences of the spectrum. The process is empirical 
in the sense that it uses the data to determine the estimator, and it requires a good 
deal of computation, but it at least proceeds by successive approximation to a stable 
solution. 


49.200 In connexion with equation (49.64) we may note some work by Hannan 
(1960) and Durbin (1961) concerning the effect of seasonal variation and trend-elimina- 
tion on the spectrum. We have already remarked on the problems created by the 
elimination of trend in distorting the residuals. With (49.64) we can regenerate the 
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spectrum of the residuals undistorted by trend-removal, at least in theory. But this 
does not, of course, mean that we can regenerate the residuals themselves. 


49.21 It must always be remembered that we are not interested in the periodogram 
or the power spectrum for its own sake, except perhaps in the domain of physics or 
electrical engineering, where an ordinate in the spectrum can be given a physical inter- 
pretation as the amount of power which is obtainable at a given cycle frequency. For 
general statistical purposes the spectrum is a diagnostic instrument whose main use 
is to suggest an appropriate model to generate the series under observation. Interest 
therefore tends to be focussed on the testing of hypotheses concerning the model, 
rather than testing particular ordinates in a correlogram or a spectrum, and this is a 
subject we consider in the next chapter. For more extensive studies of spectrum analysis 
we may refer to the books by Blackman and Tukey (1958), Grenander and Rosenblatt 
(1957), and Granger and Hatanaka (1964), and the symposium edited by Rosenblatt 
(1963). 


Unequal time-intervals 

49.22 Finally we may add a few comments on a point of some practical importance 
where daily or monthly observations are concerned. Suppose we have a series ty, wu, 
etc, observed at intervals of mt, so that our information consists of observations u(m), 
u(2m), etc. For example, we may observe a daily series once a month, in which case 
mis, on the average, about 30. Suppose further that the intervals between observations 
now vary about m to some extent, so that we have instead observations u(m+e,), 


u(2m--ej) . . . , etc. If и has zero mean and unit variance, which we may assume 
without loss of generality, the autocorrelations of the original observations are given by 
p(km) = E[u(tm)u{(t+k)m}], k= 0,1, .... (49.65) 


Those of the second series are, say p* (km), given by 
pm) = ET Ў unes )u(pE)mes ы) 
p= 


1 = 
=- рте). (49.66) 
Np=1 


Expanding p either as a Taylor series or an equivalent series of differences, we then 
have, to the second order in e, 


p* (km) = pam) + E (eye, ip! (km) 


1 „ 
+A (ер)? (km). (49.67) 
If e has zero mean the second term is small and vanishes in the limit. Writing o? for 
the variance of ғ and ту for its kth autocorrelation, we then have 
p* (km) = p( lem) +0%(1—1,) p” (km). (49.68) 


On the average, then, the autocorrelations are not seriously disturbed so long as o? 
is small. 
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For example, consider observations made on the first of the month instead of daily. 
The average month-length, taking account of leap years, is 30-437 with a variance of 
0:70. The first autocorrelation т, is — 0-42, and the second is positive. On our scale 
в? = 0-70/(30-437)? = 0:076. The autocorrelations based on the first of the month 
will then, from (49.68), be only slightly affected on average, provided that р”, or the 
second difference of p, is not very large, which is so. 

Similar arguments apply to the power spectrum. Low frequencies are emphasized 
slightly, but the effect is negligible. 


49.23 The matter stands differently for series which are aggregated, such as rain- 
fall. The effect of differing time-intervals may then be serious, as is fairly evident 
when we remember that we may be comparing sums based (in the case of months and 
days) on 28 or 31 observations. It is, in our opinion, essential in such cases to 
standardize the data by reducing them to a period of constant length. ‘This is particu- 
larly true for such data as output per working week or inputs per working month. 

Granger (1963) has discussed the matter in more detail from the spectrum viewpoint. 
See also Quenouille (1958). 


EXERCISES 


49.1 P(t) is a polynomial of degree k for 0c t« Т. Show that asymptotically the ordinate in 
the periodogram corresponding to frequency « is given by 


4P*(T) 27 
aT? 4 O(T?*-5). 


49.2 A series has the value e“ in the range 0 to T, с2> 0. Show that asymptotically the 
ordinate in the periodogram corresponding to frequency « is 
2 
ae et eT) 58 ВТ. 


49.3 Given that 
« віп Зх sin 5x ) 
mas sinx— + = 


—{л<х<{л, 


HE 5v 
graph the series whose term is x over the range 0 to 47. Compare with 49.10 and comment on 
the effects on the power spectrum. 


49.4 Establish equation (49.47). 


49.5 In (49.54) take 
Ж = 1—2a +2a cos (2k/q). 
Show that 
1 sin (q--Dy {= ((a--Dy +я/4) , sin (@+Юу—л/4} ] 
b-—|-2a) ——- Ур x : 
a [‹ D iny ^l sinta > эп@фу—л/ф 
where y = x—f, together with a similar term obtained by putting y = «+f. 
(Tukey, cf. Blackman and Tukey, 1958. ‘They 
propose the values а = 0:25 or a = 023.) 
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49.6 In a similar manner to the previous exercise, with 
ж = 1—k?/q?, 
show that 


h 1) + зіп gy cos ty а cos qy 
Э sin? dy віп? фу J^ 
(Parzen, 1961) 


49.7 As in the previous exercise, with 


з з 
а= i) «) ‚ 0<k< M, 
9, 4, 


з 
E ( -) з ы<®<а, 


" LI 
show that к= za (iis tva 
4л4 | sin {у 
(Parzen, 1961) 


49.8 If ш is stationary and normal, and у, = j/uj/, Ya = u4/p3— 3, show that asymptoti- 
cally y, is N(0, R,) with 


© 
R-6ZEj 
=% 
and that y, is N(0, №,) with 
Ri = 24 X D. 
Show how this may be used to test for morality of a stationary process. 
(Lomnicki, 1961) 


49.9 The Buys-Ballot table. A series of ри terms is written down in p rows of и thus: 


щ [e 
Mul Uut? Мон 
Щр—1)н+1 M(p—1)uk2 +++ Upu 
Sums: m, т, ... My 

Show that the sums А and B entering into the periodogram are given by 

2-2 2л] 

“уф в a, 

PÉ j=1 н 

2 8 22j 

В = — Ў тузѕіп = 

ј=1 и 


49.10 With reference to the previous exercise, consider 


n? (u) = var m/var u. 
Show that if 
uy = asin (22j/2) +Ы;, 
where b; is uncorrelated with periodic terms, then 


1% sin? віп? (nz/2) nu = 
2п* sin? (илд) += Evaro) / (te +vard), 


qu) = 
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Hence show that, in the neighbourhood of 2, the graph of 7 as ordinate against и as abscissa 
(Whittaker’s periodogram) has a peak of breadth 22*/m, flanked by smaller peaks. 
(Whittaker, 1911) 


49.11 If an autoregressive series of Yule type (47.74) is subject to errors of observation 
which are independent from term to term, show that the serial correlations (except ro) are reduced 
in constant proportion, say c. Hence, if «, В in 

Uppa gea But = Etha 
are estimated from the Yule-Walker equations (47.66) as a’, b’, show that, for the series subject 
to error, 
Ы Ја b/a, 
where а, Ь refer to estimates for the same series not subject to error. 


49.12 The following are the spectral densities computed for the series of Table 47.4 and 
Fig. 49.3 for smoothing with a Parzen window (Exercise 49.7) and various truncation points in 
the number q of serial correlations computed. Sketch the power spectra and note the dis- 
turbing effect of having q too large, 


Frequen: No | 
(cycles per year) | smoothing ents 1.520 4.9.90 
0 0-0000 19-3065 15-5732 11-6349 
0:0156 10:4916 20:0120 18-2743 15-8810 
0:0313 39-5419 21-8959 21-9328 24-0449 
0:0469 18:6496 24-3610 25-1102 28-3488 
0-0625 40-5257 26:6535 27-2150 26-1922 
0-0781 14554 28-0290 29-1099 25-9544 
0-0938 34-9011 27-8621 30-4033 33-5149 
0-1094 52-4760 25-8092 28-9764 35-8784 
0:1250 36-6309 22-0280 23-7428 | 25-2362 
0:1406 2:7163 17-2623 16-5376 13-2443 
0-1563 3-7282 12-6006 10-4166 | 7:8444 
0:1719 9:6669 8-9806 6-9051 5:9028 
0-1875 2-5688 6:7729 5-5586 4:6738 
0-2031 0-0968 5-7301 5-3329 | 48532. 
0-2188 16-3082 5.2859 5.4431 | 5:9667 
0:2344 5:5837 4:9314 5:3382 6:0461 
0:2500 7:7572 44191 47363 49161 
0-2656 1:3151 37474 3.7738 3:5804 
0-2813 0-4748 3-0386 2:8106 2:5929 
0:2969 2:4491 2:4260 2-0975 1:8886 
0:3125 0-9332 1-9872 1:7045 1:3929 
0:3281 1-9702 1-7203 1:5886 1-3985 
0-3438 0-7401 1-5573 1-6005 | 1-7491 
0:3594 40405 14114 1:5402 1:8142 
0-3750 0:3481 1-2300 1:3106 1:3800 
0:3906 0:4831 1-0181 0-9851 0:8537 
0:4063 0:5796 0:8206 0:7045 0:5774 
0:4219 0:1827 0-6847 | 0-5519 0-4695 
0:4375 0:4529 0-6311 | 0:5288 0-4463 
0-4531 0:4774 0-6486 | 0:5974 0:5478 
0:4688 0:1512 0-7041 | 0-7097 0-7256 
0:4844 1:1829 0-7582 0:8112 0:8687 
0-5000 1-8403 0-7801 0:8521 0:9190 


СНАРТЕК 50 
TIME-SERIES: SOME FURTHER TOPICS 


50.1 The theory of time-seriés has not reached a stage, and may never reach a 
stage, at which a clearly structured account of it can be given. То some extent this is 
due to the complicated nature of the subject—we have to take account not only of 
probability distributions but of their autocorrelations over time, and the embarrassing 
profusion of parameters which results may make it difficult to choose among a sizeable 
set of different hypotheses which are all consonant with the data. In some fields, 
especially in economics, experiences are rarely long enough to enable us to lean as 
heavily on our models as we can, for example, in physics. A run of fifty years’ data is 
“ long " as such series go, and even if longer, may arise from a system which is itself 
undergoing important structural change. 


50.2 The advent of the electronic computer has removed most of the tedium which 
was a serious obstacle to former workers on time-series analysis, but there remain the 
problems of formulating and testing hypotheses or of setting up a model of the system 
under study. For this reason, a working statistician very often needs to call in aid a 
great deal of extraneous information of a non-statistical, perhaps a non-quantifiable 
kind, in order to define his problem and to set up his models. We shall not attempt 
a review of the considerations and methods to which he must have regard in this part 
of his work. We take them for granted, and in this final chapter shall consider the 
purely statistical aspects of the subject: estimation and hypothesis testing, multivariate 
extensions, and some related questions concerning identifiability and mixed regressive- 
autoregressive systems. 


Estimation 

50.3 We begin by emphasizing some general points which are peculiar to time- 
series analysis and, in one form or another, bedevil attempts to reach exact results in 
problems of estimation or hypothesis testing. 

Consider the likelihood function of an autoregressive series. It will make for clarity 
if we discuss a Markoff scheme, although the argument is general. Fora set of observa- 
tions ш, Ug, ..., u, we have 

ui = pilot £i 
из = put bg (50.1) 
Uy = pus 1+1. 
If the probability distribution of the г? were known, say as f(e;, £» ..., &,), we might 
regard (50.1) as determining a variate transformation to new variables и, Us, . . . , tt. 
But here we encounter a difficulty in that и, is also involved. We have, in fact, п 
47 
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variables e and n+ 1 variables v. Let us then add to (50.1) the supplementary equation 

му = Uy. (50.2) 

We know that uo is dependent only on £p ¢_,, etc. and hence is independent of ғ; to 

E€» If its frequency function is g(u)) we then have for the joint distribution of to, 

isis бн 

dF = files, є, ..., &)g(uo) der дез ... de, dus. (50.3) 

Let us now make the transformation to variables to, uj, ..., и. The Jacobian is 
easily seen to be unity, and we find 

dF = f(u,— puo, из pty, . ++, и, ри, 1) (ио) duodu; ... dup. (50.4) 

To manipulate this expression for the purpose of deriving estimators or tests we require 

to dismiss the element иу, which is unknown. (If it were known we should have started 


the series of observations with it.) "There are several ways of doing this, but they all 
involve some sort of limitation on our inference: 


(a) we may assume u, known and make the inference conditional upon it; 
(b) we may make the sample circular, i.e. assume that и, = uy; 
(c) we may neglect uy by showing that asymptotically its effect is negligible. 


50.4 The method we shall consider is the third. Suppose, for example, that the 
e’s are distributed normally with unit variance. Then wp will be normal with variance 
1/(1—p*), and for log L we have, apart from constants, 


log L = log 1-5)-3 (уи) 1-5%). (50.5) 
The term involving и is seen to be — $uĝ + puou, and we integrate this out to obtain 
log L = const. +4 log (1—p*)—} E (u—pu.)—Ml1-b)uj. (50.6) 

For large n the summation dominates log L, and asymptotically we have 
log ~ =} E буру)" (50.7) 


We can estimate p by maximizing this likelihood, which is equivalent to minimization 
of a sum of squares over (n—1) terms. Apart from the approximation, the results are 
what we would have got by treating the autoregression as an ordinary regression. 
'The ML estimator is then 


50.5 The same point may be made in a different way. The variance of u; is 
1/(1— p?) and the correlation of u; and и, is р!#—2!, Thus the dispersion matrix of 
Ау ои Ш, 18 

р pol. prod 
n—2 


езу уе cea) 
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The determinant is found to be 1/(1— р?) and the inverse is 
T —p 0 0-0 


Р" Ei ote ОТ 
EC sO) аатын Бр ры: caius (50.9) 


0 0 0 PT 
Hence, for normal variation, the log likelihood, apart from constants, is 
n—1 n 
"up a--ues руини 20 Ë чул) 
j-2 j= 


= Hog (1-0)-H È omae). (5010) 
which brings us back to (50.6). 


50.6 A second general point to notice concerns the relationship between auto- 
regressive and moving-average schemes. We have already remarked in 47.18 that an 
autoregressive scheme is equivalent to a moving average of infinite extent. It might, 
then, be thought that a moving average, being actually of finite extent, would have 
simpler estimational properties. This turns out not to be so. We can illustrate the 
point by reference to the scheme 


шщ = Ept Bey. (50.11) 
The dispersion matrix of the series (with unit variance for e) is 
1+ В DS 
iB LEB Oe 0 
0 pe tees Bi. aie (50.12) 


0 0 CPE Retard ert 

With В = — p this is nearly the same as (50.9), but the difference is not negligible and 
(50.12) is not so easy to invert. Consider, however, (50.8) with Ё = —p, namely 

1 -p BP vot 

1 - 1 - „=з 
Toph ыс ао ates MIR 

} (era (а (6>... d 

If, in (50.11), we modify the model slightly so that e, is zero, then var ш = 1. 
Likewise if ғ, has variance 1—5?, var и, = 1. The other values are unaffected and 
hence (50.13) represents the inverse of the dispersion matrix of the modified scheme, 
which clearly is asymptotically the same as the scheme (50.11) since only two end 


terms have been altered. 
Thus, to this degree of approximation, the log likelihood is given by 


1 5 x: n-i 
log L = const. — 4 log (1 -2-7 2р {5 15—28 х шуша 
. m Yu экз е 2: (50.14) 
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In this expression all the observed serial covariances are involved, whereas for the 
autoregressive scheme we need only as many serial covariances as there are constants 
to estimate. Even if we neglect the terms outside braces in (50.14) we are still left with 
a cumbrous likelihood function to manage, and in particular the ML equations are 
intractable. 


Example 50.1 
It might be supposed that the difficulty could be overcome by using a different 
estimator. For example, we have for the first autocorrelation of (50.11) 


= B/(1+6?). (50.15) 
It is plausible, then, to estimate f by solving 
b 
тем =" (50.16) 


But unfortunately, as Whittle (1953а) showed, this is a very inefficient estimator. 
In fact, for the asymptotic variance of r, (equation (48.9)) we have 


E 
nvar- X (pi + pia Pr 7 Рур Pisi + 2рі рӯ), 
== 


which, in our present case, reduces to 


avarry = 1-155) +4(; Е). (50.17) 
From (50.15) we have 
1-5 
=з d (50.18) 


and hence, asymptotically, 
Fie = (ср (14-9) 39? (1-2) 48%. (50.19) 
For example, with f = 4 we find 
389 


n var b = ia (50.20) 


Taking log L in the = 
zi-p5 (Zuj-28Zu;uj,-20*XEujyu— ...] 


-xi 
- i^ say, 
we find 
E(A) = 1-f* 
0A 
m5) --и 


re zi а а 0A. al 24. 


a (1—68) (1—B*)*J a-pyep 1—B? of? 
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whence 
д? 1 


Кэз) = - rte 
and thus the variance of the ML estimator is given (cf. (18.60)) by 


var = 1-0. (50.21) 


n 
For f = 4 this reduces to 3/4n and comparison with (50.20) shows that the estimator 
b from (50.16) has a variance 3:6 times the optimum value. 
The result is unexpected but easy to understand. The estimator of (50.16) uses 
only the first serial correlation and forfeits the information in the other serials which, 
as we have seen, all appear in the likelihood function. 


Estimation in autoregressive series 
50.7 For the general linear autoregressive series 


Е 
Xu = s (50.22) 
j=0 


the same kind of argument as we used in 50.4 shows that, with the usual neglect of 
end-effects, the ML estimators in normal variation are given by minimizing 


which gives rise to the Yule-Walker equations (47.66). Or equivalently, we can treat 
the estimation problem as one in ordinary regression. 

The basic theorem on this subject is due to Mann and Wald (1943) who proved 
rigorously that asymptotically the sampling properties of least-squares estimators @ of « 
are the same as those of least-squares regression estimators in multivariate normal 
systems. This useful result is enough for most practical purposes. Experimental 
studies on series generated by rectangularly distributed e’s, and for moderate length n 
of 60 terms, indicate that the Yule-Walker equations can safely be used in such cases, 
though it is better to correct the estimates of autocorrelations for bias by Quenouille’s 
method—cf. 48.4. 


50.8 The kind of hypothesis concerning generating schemes which we mostly wish 
to test concerns the comparison of an autoregressive scheme of order k with one of 
order k+1. That is to say, if we assume that the series is autoregressive, how far do 
we carry the regressions? From what has been said it will be evident that we can go on 
fitting extra terms in the regression until there is no appreciable diminution in the 
sum of squares. In fact, autoregressive fitting is rather simpler than ordinary regression 
because we do not have to face the usual problems of how to reject “ insignificant ” 
variables when the regressors are of mixed types. 


Example 50.2 
In Table 45.4 we gave a series of figures for the sheep population of England and 
Wales from 1867 to 1939. Fig. 45.4 indicates that the downward trend in these figures 
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Table 50.1—Residual values of the sheep series of Table 45.4 after elimination of trend 


by a simple nine-point moving average 


Residual Residual Resi 
Year 5000) Year ХОЛ Year tia 000) 
1871 —176 1893 + 34 1915 +19 
72 —112 94 —103 16 +128 
73 + 50 95 —104 17 + 97 
74 +141 96 — 15 18 + 69 
75 + 60 97 — 23 19 — 29 
76 — 20 98 + 17 20 —174 
77 + 12 99 + 71 21 —107 
78 + 82 1900 +35. 22 —142 
79 +130 0t + 16 23 —109 
80 — 14 02 — 27 24 — 23 
81 —166 03 - 32 25 + 60 
82 —179 04 — 49 26 +121 
83 — 84 05 — 61 27 + 94 
84 + 38 06 — 52 28 — 25 
85 + 97 07 — 24 29 — 90 
86 + 8 08 + 68 30 — 75 
87 = 5 09 +141 31 + 72 
88 —105 10 +119 32 +152 
89 — 99 11 + 66 33 +112 
90 - 35 12 — 52 34 — 64 
91 +159 13 —117 35 — 87 


92 +167 14 — 61 


is approximately linear. In Table 50.1 we show the residuals in this series after the 
elimination of trend by a simple nine-point moving average. We have to consider 


how far this residual series can be represented by an expression of the form 
ш = f (ti-i ap 5) +. 


The first ten serial correlations are as follows: 


Order of 


Order of 
correlation k correlation k Tk 
1 6 0-144 
2 7 0-203 
3 8 0-118 
4 9 0-006 
5 0 —0:078 


(50.23) 


We first of all consider what order of linear autoregressive scheme would be required. 
This is most easily decided in terms of partial correlations of и, with u,. eliminating all 
intervening observations, and the corresponding multiple correlations determined by 


(27.61). We find— 
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Value of k 

lag А partial r of lag k na =) =1-R 
1 0-595 0-6460 
2 —0:782 0:2509 
3 0-097 0-2485 
4 —0-183 0-2402 
5 0-031 0-2400 
6 


0-014 0-2400 

There is apparently no appreciable gain in representation to be obtained by taking 
a linear autoregression of order greater than two. Note that the high values of | rs |, 
|r, | disappear upon partialling, whereas the small | 7, | is replaced by the largest 
partial | r |. 

We might, however, wish to examine the question whether curvilinear terms might 
improve the autoregression fit (even at the expense of rendering the model non-station- 
ary). This is most clearly decided by drawing the scatter diagrams of и, оп up, and 
of u; on u,.s, which are shown in Fig. 50.1 There is no sign, to the eye at least, of 
curvilinearity in this scatter of variation. We conclude that, so far as autoregressive 
representation is possible, it is adequate to take the Yule scheme 

Uy = — Hy Uy_y— gt at Ep (50.24) 
in which the variance of e is about 25 per cent (i.e. the value of 1 — R? above— 


cf. (27.56)) of the variance of и. 
The constants о; and х, are easily estimated using (47.77-8) as 


ТЕ =й 
-a = "E 21060 -u= ica = -0782, 


and the autoregressive equation is 
ш, = 1:060u, ,—0-782u, s. (50.25) 


Test of fit for autoregressive schemes 
50.9 Itso happens that for autoregressive schemes the partial autocorrelations can be 
obtained directly, a fact which was used by Quenouille (1947b) to provide an ingenious 
test of fit. 
Corresponding to (50.22) consider a variable э; defined by 
k 
Х ащы = Ny (50.26) 


where the u’s go forward, so to speak, instead of backward in time. 

We have 

COV (No Neri) = E( out se) (озш) 
k 
= E 4- 
Aseo ke 14-4 

where y, is the pth autocovariance of иь, 

Me 
= E yE аууы: (50.27) 

=0 “j=0 
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-200 


Fig. 50.1—Scatter diagrams of u: and wu: (upper diagram) and иг and ш— (lower diagram) 
for sheep data (Example 50.2) 
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The second summation on the right vanishes, in virtue of the Yule-Walker equations 
(47.66), for Z >0. The same result holds for Z <0. When / = 0 we find 
Var y, = vat e, (50.28) 
Thus, if e is a normal random variable, so is 7, with the same variance. 7j, depends 
on ғу. and previous г?з and is therefore independent of ¢,,,,, for 150. 
Consider now quantities q defined by 


1 n 
qus 2 (#10). (50.29) 
We have 
E(g) = 0, j>k. (50.30) 
var фу = Ё(єї ) Ei) 
= (vare) ј> А. (50.31) 
cov (g qj.) = 0, 150. (50.32) 
Define the quantities 
w; = q;/varu 
=l S ane чш SUR (50.33) 


n Varu tai 
Then each о; has zero mean, variance equal to (var e/var u)?, and is uncorrelated with 
the other o's, 

We observe that 2,,; is шу after removal by regression of the terms Wide жал, 
шаз- Оп the other hand, 7; is и, after the removal of titr... , Ung. Thus, for 
j > Е the correlation between ¢,,; and у, namely @,, is the partial correlation of terms 
in the series distance j apart. 

For large samples we have, from (50.22) and (50.33), using sample values for the 


serial correlations, 
1 n 
ape EE E (xotg. ... орша д) (og ses Hoy +) 
= Agr t+ Ayr yt... Agr) aps (50.34) 
where the A’s are given by 


i 
А = Xon (50.35) 


Example 50.3 
For the Yule scheme (47.74) we find, from (50.35), 
4-1 4 = 2m, А, = а+20, Ay = 210, А, = 03, 
and hence, asymptotically, 
оу = 75 225 7, 4 (af + Zerg) so + 20уозту зоту, j>2, (50.36) 
is distributed with variance 


1 [zs +). (50.37) 


п 1+, 
In Example 50.2 we found, for the series of 65 terms of residuals in the sheep series, 
a, = —1-060, о, = 0-782. 
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Substitution in (50.36) then gives 
wj = rj —2-120r; ,-- 2-688r, ,— 1-658r; ,+-0-6127;_4, 
with variance 9-69 x 10-4. 

Considered as estimators of partial correlations in series of moderate length these 
quantities are rather indifferent, being affected by sampling fluctuations or casual 
errors. They rest on the assumption, of course, that we can use sample estimators 
of the оз. We find 

оз = 0:025, œw, = —0-043, оу = —0-001, 
with a standard error of 0-031, and reach the same conclusion as in Example 50.2, 
that a second-order scheme is sufficient to account for the data. 


Moving-average processes 

50.10 Тһе difficulties we remarked upon in estimation of the constants in the 
pure moving-average process (50.11) are obviously intensified when averages of greater 
extent are concerned. Asymptotic expressions for the likelihood may be derived, but 
the ML equations are extremely cumbrous. We shall describe a method due to 
Durbin (1959b) which, in effect, turns the problem into one of autoregression. 

Consider, in fact, the simple model (50.11). This is equivalent to the infinite 
autoregression 


u,— pu, a4 BP ty a— ... = 8j (50.38) 
Compare this with the finite autoregressive scheme of order k 
[7L Oy yy + egy а... Барш = Ep (50.39) 
with a, = (– В)“. 


The difference lies in the remainder after k+1 terms of the autoregression: 
(78)? ty gat ( B) tust +++ 
= (В) (ua Puit +} 
= (—p) + etk- (50.40) 
The variance of this term is 8+2 var e, and for | |<1 this tends rapidly to zero as 
k grows larger. Consequently the representation (50.39) can be made as close to 
(50.38) as we like by taking Ё sufficiently large (but small compared to л). 
Let à, аз... , а be the least-squares estimators of ж, 45, ... , а, in (50.39). From 
the Mann-Wald theorem of 50.7, and (19.16), we know that the (a — «)’s are asymptotic- 
ally normal with zero mean and dispersion matrix equal to V; !/n, where V, var e is the 


dispersion matrix of the regressor variables, namely of s, 4, ш... ш. This is 
given by the matrix of (50.12). Hence for the asymptotic distribution of a, . . . , a, we 
have 
mv, : p 
dF = "Quy exp [-+ (1 +°) X (а,—оу)*+28 E (4—9) (аз) tn +s day. 
(50.41) 


The expression in curly brackets, say Q, is the essential part of the likelihood function, 
since | И, | is (1—52**2)/(1—58?) (cf. Exercise 50.1). We can simplify О to some 
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extent. Consider the Yule-Walker equations (47.66) in the form 
Yor Xa Yi «+. ER Ma = —у, 
оу оз уз+ ... ару = -ya (50.42) 


о Vert a Yet oes Бор Yo = — Ye 
Putting yo = (1--f?)e?, о? = vare, y, = 0°, we have 
(1+3), + Bas = —B, 
Ba, + (1+ B*)o,+fo,.=0, r= 2,3,...,k—-1, (50.43) 
Bay, + (1+ *)ay = 0. 
Multiplying these equations by —2a; +o; (j = 1,...,) in turn and adding, we find 
the expression 


Q = (1+6), i 225 z EET P (50.44) 

Since for large k, æ, is nearly equal to —f, this gives, on putting a = 1, 
Q=(1+ P)X dj 28S a appe (50.45) 
The estimator of f is now given by differentiating О and equating to zero in the usual 


way, which gives us 


k—1 k "s 
b = Yao / Es. (50.46) 


50.11 This estimator is easily computed from the a's, which in turn are derivable 
without difficulty from a regression routine. Durbin (1959b) showed, moreover, that 
to the first order in л (cf. Exercise 50.6) 

2 


"rb ROI (50.47) 
In the present case 
9S0 5$ 
Gp) 23,4 
which, for large k, tends to 2% (— й) = 2/(1— 8°). Thus for sufficiently large k, 
=0 
varb = 1 zE, (50.48) 


and the estimator b is asymptotically efficient. 


50.12 Similar methods give acceptable results for higher-order processes, but the 
expressions become more complicated. We will quote without proof the main results, 
The process is 


шщ = Ef, (50.49) 
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with the e’s independently and identically (but not necessarily normally) distributed. 
The asymptotic distribution of the least-squares estimators a of а is given by 
n|B| 


4Е = олуг P (—4nQ) da, (50.50) 
where o?B is the dispersion matrix of tı ... , u- and 
0 = (a-a)'Ba-a) 
= a’ Ba—2a'Ba+a'Ba. (50.51) 
This simplifies to 
О = a'Ba+2a'c—a'c, (50.52) 
where € = (c . . ., cy) and с, = уу/о%, 
A 
and again to О = а'Ва+2а'с+ X fj. (50.53) 
ј=1 
The estimators b of f are given by d 
k Rt k-2 k-ħ+1 E 
E DUCES E айза TW E 4jdj 44-1 b, 3 [Tm 
k-1 k k-2 
ааз Ха. еми as. b, "E E ааа 
Du bags "o ^ koh 
[TEM 2 ауйзы-з. + Хај b, Х аал 
(50.54) 
The asymptotic variance matrix of b is approximately 
2 [Exe] (50.55) 
n | aß: 3P; 
which may be shown to be equal to U/n, where 
1-% By— Bu i Bn Ва Bu -a Pn - - - Bn-1— Ва Вл 
By— Ё,-1Вһ 1+2 Bia P NT E NIC ML MEO 
U =| B.—By-2Bn Pit PB Ph Pi 7 Pi Bn 
RSNA) otk of cal divieto, гояр s nid pt 
(50.56) 


50.13 The foregoing results provide a basis for the construction of large-sample 
tests of hypotheses. For example, in (50.11), to test the hypothesis that f = fy we 
calculate b from (50.42) and test 


я = vnb) - 09) (80.57) 

as a normal deviate N(0, 1). Likewise, in the more general scheme (50.49) we test 
л à 

т=н уи МЕ) (bj Bj) (50.58) 


as а у? variable with h d.fr. Here (w4) is the inverse of (50.56). 
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50.14 Again, a test of the goodness of fit of the whole model may be derived. 
With the simpler model (50.11) we note that nQ, where О is given by (50.45), is distri- 
buted approximately as y? with А d.fr. Substitution in (50.45) from (50.46) shows 
that Q can be partitioned in the form 


k k 
О = (1-8) Х а-1+(0- 8) E а. (50.59) 
0 о 
Asymptotically n(5— 8)? X aj is equivalent to the regression sum of squares in a linear 
regression model, the remainder being the residual sum of squares. ‘Thus the goodness 
of fit of the model may be examined by testing 
k 
n{(1—b*) X аў—1} (50.60) 
о 
as а у? variable with k—1 а. г. 
For the more extended model (50.49) the minimum value of О is given Ьу 
k һ А-3 
Уад+У ЬУ 3—1 A 
i CR y X аа (50.61) 
which can be tested in у? with k— A d.fr. 


For details and some numerical results see Durbin (1959b). Wold (1949) had earlier 
suggested a more complicated test. Durbin proved for the second-order case, and con- 
jectured a general result, to the effect that the limiting dispersion determinant of the auto- 

^ 


regressive scheme X «jut; = & is the same as the limiting dispersion determinant of 
о 


h 
the moving-average scheme ш = X æjet-j. This was proved by Finch (1960) and by 


о 
A. M. Walker (1961). If there is any simple explanation of this remarkable duality it 
remains undiscovered. 


Autoregressive schemes with moving-average errors 
50.15 Consider now the mixed scheme 


k h 
È ajug = Xf; (50.62) 
j=0 j=0 


The problem of finding efficient estimators of «’s and f’s has not been thoroughly 
investigated, but the most promising method, due to Durbin (1960b), seems to be to 
iterate in the following manner. 

Suppose we have a set of values a of а. We can then transform the w’s to new 
variables z by the autoregressive transformation 


k 
= ушу (50.63) 

ЗЕ 

and estimate the constants В in the model 
г, = Убу). (50.64) 


Having determined estimates b of B, we can now transform 
E ajig = Eb; 
to autoregressive equations of the form 
E @ўщ_у = =, (50.65) 
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where the «’ are linear functions of the «’s. We can then obtain estimates of the о” 
and hence of the «’s. The cycle сап then begin again until the estimates of «’s and 
B's converge. 


50.16 The problems of applying this procedure are twofold: to ensure that the 
iterative procedure converges satisfactorily, and to find a good set of starting values. 
The first problem does not seem to have been thoroughly examined—in time-series 
analysis convergence is not a property which can be assumed without extensive practical 
testing. As to the second, we recall that the scheme (50.62) can be closely approximated 
by an autoregressive scheme of large order. If we fit such a scheme, let the residuals 
bee, In (50.62) we replace e, by e, and hence obtain preliminary estimates of х and f 
by minimizing 


k h 
(i-a Baca (50.66) 
Example 50.4 
Consider the model 
Uu, au, = Ept DEL (50.67) 
Approximate by a scheme of order k giving residuals e, 
е, = Up ty My at oes FAQ Uae (50.68) 


We minimize 
E (uy + ott, — eua) 
to obtain for estimators of х and f: 
Eau a4 ĜE uł Ê È eriti- = 0, (50.69) 
Eue.,— ĜÈ щ-1 ел-В da = 0. (50.70) 
Substituting for е from (50.68) and replacing Xu, и-„ by (Œ uj) fp, we find the asymp- 
totic expressions 


ga Antant.. КЕЛҮ (50.71) 
аут +азт +... TR 

[=й ш шкы чыш МЫ, (50.72) 
lana. 


From these the iterative solution may begin. 


50.17 The mixed scheme (50.62) with random values s has the autocovariance 
generator (cf. 47.15 and 47.18) 
LE 8) бул) a 
Gu (> o 25) (Z 2527) бо) 
and the corresponding spectral density is given (cf. 47.14) by 
_@ Be) (E Bye) 50.7 
HO = (ае) е)" em 
From estimates of the «’s and f’s we can then determine the estimated spectral density. 
It is more relevant, perhaps, to consider whether the 278 and {7з can be estimated from 
the observed spectrum. The question has been examined by Durbin (1961). 
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50.18 A more general assault on the problems of hypothesis testing and estimation 
has been made by Whittle in a series of papers, particularly 1951, 1953a and 1953b. 
The methods, based for the most part on Maximum Likelihood considerations, are 
penetrating and the results of considerable generality for stationary time-series with 
continuous spectra. They are, however, not very suitable for numerical work, 


Some multivariate extensions 

50.19 Suppose that we have р series, u; i = 1, 2, . . . , p, observed at n intervals 
of time or, in the continuous case, defined over a certain time-period. The value of u; 
at time £ will be denoted by u; and the set of р xn values of и by a matrix u. As in 
the univariate case, we regard this as the realization of a process, and our basic object 
is to determine what kind of a process it is and what are its parameters. 

In general, any row vector of u, considered as a single series, may contain trend, 
seasonal, or oscillatory movements. However, it makes for almost unmanageable 
complication to try to dissect each vector simultaneously into its constituents. We shall 
assume that trend and seasonal movements have been removed, leaving us with a multi- 
variate stationary complex u. We follow Quenouille (1957). 


50.20 The covariance of t; and шу ¿s will be written yi, and the corresponding 
correlation by р». The analogous observed quantities are cuj, Ту. For any given 
s there are $p(p+1) of these quantities arrayable in a square matrix which we write 
Y» PB» Cs Or г, as the case may be. In the univariate case p, = p_ but clearly 
Е(щ иу, 14) is not equal to E(u;u; t-s) but to E(u;, ,uj). Hence we have 

Ys = Y^ (50.75) 

As usual with multivariate extensions, the number of parameters and estimators 
increase rapidly with р. We shall refer to уц; as the cross-covariance of u; and и, for 
lags. Likewise рол, is a cross-correlation. Where necessary to distinguish between 
sample and parental values we shall use those words, although they may often be 
omitted when the symbols themselves make it clear which is under reference. 


50.21 As in the univariate case, we shall be concerned with three types of model, 
autoregressive, moving-average, and mixed autoregressive-moving-average systems, 
Let e; be a series of independent random elements. Corresponding to the univariate 
case 


1 
wem „Эё, 1-8 


we now have 


Is 
wg = X ХВ є, (50.76) 
a=0 ј=1 
and a corresponding autoregressive scheme 
E ap 
EE ei te = te (50.77) 


Writing D for a shift operator such that Du, = u, 1, we may express (50.76) in the form 


EE 
ug = га ELE (50.78) 
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or, in matrix form, 


ч, = X B,D'e, 
s=0 
= ED)e, 
1 
where B(D) = = B,D’. 
#=0 


Likewise, for the autoregressive scheme of (50.77) we may write 


k 
€; = XA Dra; = A(D)e,, 


where A(D) = EAD. 
We may write the solution of (50.82) 
u, = A^! (D)e, 
The terms in А and B are polynomials in D. We also define 


487 


(50.79) 
(50.80) 
(50.81) 


(50.82) 


(50.83) 


(50.84) 


50.22 Without loss of generality, we will choose the scales so that the e; have zero 
mean, the same variance for all i, say o°, and аге uncorrelated. ‘Then we have 


ү, = E(uui.). 
Substituting from (51.79), we find 


1 , 
WS z( è B, р‹)( х B,D" ers) 
j=0 meo 


= X У B Bn Eley t'em) 
i 0 


=0 mei 
= 0° È В, Bn 
„т 
= coeff. of D* in o? (£ B; D) (Z Bj D). 
Hence we may write 
Y = B(D)B' (D-?). 
Likewise, for the autoregressive equation 
Y = A2 (D)A'1 (D-). 
It is easy to show that for a mixed scheme 
Au, — Be, 
ү = A^! (D) B([D) B' (D-?) A'1 (D). 


(50.85) 


(50.86) 


(50.87) 


(50.88) 


These аге the multivariate analogues of (50.73). In them the D’s may be regarded 
as dummy variables equivalent to what we have formerly written as z. Equations 


(50.86)-(50.88) are, in fact, covariance generating functions. 


50.23 For the autoregressive scheme we have natural generalizations of the Yule- 


a 
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Walker equations (47.66). Postmultiplying (50.82) by u, and taking expectations, 
E 
E È A, ша, = X A, E(u; ш) 
2-0 


= EAqQq,.-20, g>0. (50.89) 
s=0 
The solution of these equations is not, however, an easy matter. 


Degeneracy 

50.24 Apart from the ordinary problems which we encountered in the univariate 
case, there are two further complications for multivariate series. 

In the first place, there may exist linear relations among the variables, in which 
case the matrices A or B may become degenerate. Steps must then be taken to remove 
some of the variables 


Example 50.5 (Quenouille, 1957) 


Consider 

Чи = Eut Es, ta 

из = Eytt 8g, * (50.90) 

Им = Eagt En, 171 

We have 
ld 
B-|1 1 ch (50.91) 

0 14D 


and from (50.86) 
y = B(D)B'(D7) 
2 14D 1+р 
=[1+D 2 1:071]. (50.92) 
1-D- 14D 2+D+D" 


It is then found that | y | = 0 and the matrix has rank 2. There must then be a 
linear relation among the variables. In this case we can almost determine it by inspec- 
tion, but formally we should look for the zero latent root of and its associated latent 
vector. The latter is proportional to 14+ D, —(1+D), 1—D and the relation is therefore 
(1+D)ur— (1 4- Dus 4- (1— D)us, = 0 
or 
Map Uat Ugy = qa Hg qa Usti (50.93) 


50.25  Degeneracies are, in practice, the exception rather than the rule, and when 
they occur can be dealt with fairly easily іп the manner of Example 50.5. Моге im- 
portant, and more difficult to deal with, is the fact that equation (50.88) does not deter- 
mine А and B uniquely, however good our estimators of y. 

Write temporarily Е for AB. Then (50.88) is equivalent to 


Y = FD)F' (D>). (50.94) 
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If ф is any diagonal matrix for which the ith diagonal element is ¢,(D)/¢({D-), if qp is 
any diagonal matrix with diagonal elements y,(D)/p(D~*), and if J is any matrix such 
that JJ’ =I, it is easily seen by substitution in (50.94) that F(D) Ф(р)Ј ФР) may replace 
F(D). Thus, many different schemes may give rise to the same covariance matrix. 


Example 50.6 (Quenouille, 1957) 
Consider the matrices 
к=? =) к= (11Р A 
1541 6+D/)’ А 5 3+2р 
E — :( 2+7р 2) Е 1055900 ЖЕЛ), 
2 54-11-60 2843D/' + 17455+300 1+84р J’ 
It can easily be verified that 
| Fi] = 4+D)(3+D), | 
|Fs|=(+D)(1+3D), | 
and that for each F 


F,| = (3+D)(1+4D) 
Е, | = (1+4D)(1+3D), 
E pores 3+7р ) 
А 3+7р-1 38+60+6р-!/' 
Furthermore, if we postmultiply the F's respectively by orthogonal matrices Jy, Ja, Jas Ја, 
ү is unaltered. 


50.26 It remains for consideration whether all the possible solutions of (50.84) 
are acceptable; for example, whether they all provide stationary series. So far as is 
known, some fairly stringent conditions must be imposed before we can derive a unique 
solution. The following treatment is due to Phillips (1959). 

We consider the mixed scheme with independent residuals and assume (1) that A 
is non-degenerate (which we can always ensure as in Example 50.5) and (2) that | А |, 
a polynomial in D, has different roots 21, Ža . - . 5 А. Then if a(D) is the adjoint of 
A(D), we may write (50.88) as 

«(D)B(D)B' (D-?)a (D) 


ү= TAD) A>)” puse 


Expressing | А | as the product П (D—2,), we see that the right-hand side may be 
r=1 


expressed in partial fractions 


shy лу сые 
Pa ay ay a (50.96) 


where K, is a p xp matrix given, according to the usual theory of partial fractions, by 
_ [(-2)«(D)B(D) B' (D~) a" e? 
А mem a чат 
In (50.96) we do not want terms in positive powers of D, which implies the condition 
that | В | is of lower degree than | A |. 
For a simple root 1, the matrix A(/,) is simply degenerate, and its adjoint a(A,) is 
of unit rank—a known result in matrix theory. We may then write 


o(4)) = ks, (50.98) 
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where k, is a (px 1) column vector and x, is a (1x) row vector satisfying 
А(2,)Е, = 0 (50.99) 

к,А(2,) = 0. (50.100) 

In point of fact K, itself is of unit rank. For if we define a (1 хр) row vector 1, by 
_ (D—2,)x,B(D)B' (D~")a’ (D-1) 


== Ед: M 50.101 
>= Тару) р 
we find on substituting (50.98) in (50.97) that 
К, = kl. (50.102) 
Now from (50.99) we have 
AQ@)K,=0, r=1,2,...,m. (50.103) 


Given, then, the covariance matrix y, we express it in partial fractions and hence deter- 
mine K, and Л. We can thus derive the set of equations (50.103). The question is 
whether this set is enough to determine the coefficients in А uniquely. 


50.27 Consider first of all the case when all the scalar equations in Au, = Be, 
are of the same order © and cannot be reduced to lower order. We now impose two 
further conditions, (a) that the elements in the leading diagonal of A are of degree v 
but that non-diagonal elements are of degree v—1 at most (this means, among other 
things, that | Æ | is not zero); (b) that the elements of the corresponding row in В 
are of lower degree than v (this means that no terms in 2 arise as numerators in (50.96) ). 

Without loss of generality we may suppose that the coefficient of D" in each diagonal 
term in А is unity. | А | is of degree ро which is therefore equal to m. Any given 
row in A then has ро coefficients to be determined, and equation (50.103), for m = ро 
values of r, provides a set of non-homogeneous independent equations. Thus the 
coefficients are uniquely determined. 

When A is determined we find B(D)B'(D-!) from 

(Da = $ AOD)EA' (D~), $ A(D)K A'(D7) 

BD) BD) Fe Sees QE D 

which is derived from (50.95) and (50.96). There remains an indeterminacy for B 
itself, This can be resolved only by extraneous information. 


(50.104) 


50.28 If the equations in the system are not all of the same order we require still 
one further assumption to identify A. Let the equations in Au, = Be, be arranged 
such that the first equation is of lowest order and any subsequent equation is not of 
lower order than its predecessor. Then if A and B satisfy (50.88), so do pA and uB, 
where p. is an arbitrary matrix of constants. We can add any row of A to a later row 
without violating the condition that the non-diagonal elements be of lower degree than 
the diagonal elements. But we cannot add to a preceding row. "Thus p can only 
be a triangular matrix with zeros above the diagonal. 

We can make the system identifiable if we are prepared to assume that the elements 
e are not correlated from one equation to another, that is to say that B is diagonal. 
For then B(D)B'(D-?) is diagonal, and hence the non-diagonal elements of pBB’ u^ 
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are zero. Writing Р, for the diagonal elements of BB’, the non-diagonal elements of 
uB(D) B'(D-*) р’ above the diagonal are found to be 


Hn bn far Harbus Mar Шафий + + 
Hs Py Hart Hae Роа Moe Hear а Mar + Mae bos Mas 


Since иу cannot vanish, Из, ИШ, etc. must do so; and hence дз, цз, etc.; and so on. 
Hence p is diagonal and the equations are identifiable. 


50.29 It will be evident enough that the problems associated with identifiability 
are formidable. We can, at least on a heuristic basis, estimate the covariance matrix ү 
and hence the product A-! (D) B(D) B'(D-?) A’-1(D~). To proceed thence to the 
individual coefficients in A and B requires conditions on the problem which are not 
always easy to verify; and in any case appeal to extraneous knowledge is sometimes 
necessary to reach determinacy. One of the major outstanding problems of multi- 
variate temporal systems, in fact, is to ensure that а model is unique; and this apart 
from sampling considerations. 


Cross-spectra 

50.30 Just as we may consider the cross-correlations of series and obtain what 
might be called cross-correlograms, so we may examine the extension of spectrum 
analysis to the simultaneous variation of series. 

For any pair of series, say и, and из, we have a set of cross-correlations рэ)» 
s= — 0..., co and, in extension of the spectrum of a single series (47.21), may define 
a spectral density 


Wyo (a) = E разу EXP (isa). (50.105) 


There is a corresponding spectral function W(«) defined over the range 0 to л. 
Conversely, as at (47.23), 

рази = (1/7) E wy, (x) exp (— isa). (50.106) 

In univariate formulae, owing to the symmetry typified by p, = p-s sine terms 

disappear from expressions relating spectral density to covariances or correlations. In 

the multivariate case, pas), is not the same as pass Expansion of (50.105) gives us 


А © 
wala) = 1+ 2 {puns COS s+ Paos) COS Sa} +i 2 {pusy Sin sx — pasy- Sin sa} 
с(о) +1д(а), say. (50.107) 


50.31 Тһе quantity c(«) is called the co-spectrum or co-spectral density. q(x) is 
called the quadrature spectrum or quadrature spectral density. Sometimes both these 
quantities are plotted against x. The sum of squares c*--g* is called the amplitude 
of the spectrum. The standardized quantity 

= Plta la) 
C(a) FORO! (50.108) 
where w, and w, are the spectral densities of и, and из, is called the coherence. 
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Phase relationships in the series are studied by three types of diagram: the phase 
diagram, plotting y(x) against о, where 
E q() 

v(@) = are tan {2 o), (50.109) 
the Argand diagram, which plots c(z) /t (2) as abscissa against ф(ж)/ (ж) as ordinate; 
and the gain diagram, plotting R?,(«) against о, where 

Ri (a) = IC) 50.110 

ъ (2) w, (z) ( ) 
A good computer programme will calculate and graph the quantities required. (50.108) 
and (50.110) are analogues of correlation and regression coefficients. Some further 
details are given by Granger and Hatanaka (1964). 


50.32 For multivariate series of the autoregressive or moving-average types there 
is a straightforward generalization of the relation between the covariance-generating 
function and the spectral density. We have, in fact, 

v(x) = A-! (е) B(e'*) B' (e) A'7! (e-2), (50.111) 
but this is not, in practice, a very useful formula. 

The generalization of spectra and cross-spectra to polyspectra for k-dimensional 
time-series is discussed by Brillinger (1965). 


Example 50.7 

To give some idea of what cross-correlations and cross-spectra look like we take an 
artificial series constructed by Quenouille (1957), reproduced in Table 50.2. The 
series was constructed from 


шц = Uy, yy 70 dus py tery 
им = 02и, pa + tle, p-1— 03ta, ¢_1 + Ep (50.112) 
иң = 0-9u, 1+3. 

The e’s are rectangular random variables ranging from —49 to +49. 


Table 50.3 gives, for s = 0 to 5, the theoretical covariances у, and the observed 
covariances c,. "The serial correlations up to order 25 are given in "Table 50.4. 

In Figs. 50.2 and 50.3 we have graphed the logarithms of the spectral ordinates 
of the three series and the logarithms of the amplitudes of the cross-spectra. The 
series are effectively Markovian. "Their cross-spectra exhibit much the same pattern 
as the schemes themselves. In general, moments of order higher than 2 are involved. 


50.33 It hardly needs to be stated that problems of estimation and hypothesis 
testing for multivariate series are much more complicated than in the univariate case, 
which themselves, as we have seen, are far from simple. 

Scrutiny of the correlogram or power spectrum for individual series will usually 
suggest whether a Markoff scheme is likely to be sufficient, or whether some more 
elaborate scheme may be required to explain observations. The basic elements used 
in deciding such questions are the serial correlations or serial covariances. We may 
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Table 50.3—Values of y; and cs for the series of Table 50.2 
(Values underlined are the values of the determinants of the corresponding matrices) 


s 


Ys 


с, 


1 
~ © 


normalized spectral density 


b3 


tog 


-3 


4,392-92 6,191-72 4,010-45 
3,396.37 5,010445  5,831-9t 
5:2245 x 1019 
[7,438.85 3,773.75 2,895.33 
4,949-64 5,567.17 3,940- as] 
3,935-63 5,572.55 4,509.41 
1:4106 x 101^ 


[6,943-89 3,217.03 119038) 


paz 4,392-92 E 


,9 


5,251.32. 4,650415 3,166-38 
14,454:67 5,010-45 3,546.13. 
3:8087 x 10* 
[6,418.76 2,752.01 2,1846 
5,303.69 3,790-42 25231] 
L4/726-19 4,185314 2,849:74. 
1:0283 x 10° 
[5,888.39 2,372.97 1,924-39 
5,169-59 3,085.29 218482] 
_4,773:33 3,411.38 2,342:53. 
2:7765 x 108 


[5,371443 2,064-44 іэ 


4,915:27 2,536-47 1,866-94 
L4, 652-63 2,776776 1,966-34. 


[6,896-05 4413-37 3,511.74] 
441337 662507 5,573-84| 
L3,511.74 5,573-84 6,161.75 
3:8320 x 101 
[648704 387769 309775 
4822-82 5,984-09 4,624-31 
L3,98248 5,944-23 4,97927] 
8:2402 x 10° 
[6,053.67 3,384-15 2,784-754 
5,008-42 5,175-63 3,681-88 
[1142145 5,31540 3,925.26] 
3:4807 x 10° 
[5,589-50 2,865-88 2,292.22 
4877-75 42241 2,932.37 
4,510.09 4549-42 3,332.35 
2:6326 x 10° 
г5,128:65 2,388:92 1,928.00] 
4698-35 3,511-09 2359-65 
4376-84 3,819:96 2,824-63- 
2:5781 x 10* 
[4,602.27 2,090625 1,825-867 
4,54936 2770:88 1,835-51 
L421130 3,4085. 2,009.68] 


7:4966 x 107 9-1749 x 108 
Series2 
\Series3 
pe ay: zs 
(Pd 
7 2 3 + 3 
Frequency (cycles per unit) 


Fig. 50.2—Spectral functions of the three series of Table 50.2 
The ordinate is the logarithm to base 10 of the spectral density divided by the variance 


of the series. 
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Table 50.4—Serial and cross-correlations of the series of Table 50.2 
| Correlations (decimal points omitted) 
Order of = = Е — 
correlation " P 
Series 1 | Series 2 | Series 3 | > leading 1,3 leading 1) 1 leading 2,3 leading 2, 1 leading 3/2 leading 3 
MEE. kx ioi 
0 | 1000 1000 1000 598 459 598 855 459 855 
1 933 898 787 677 | 551 512 929 389 693 
2 860 769 600 714 635 434 825 341 534 
3 782 617 496 697 | 655 354 697 258 408 
4 707 500 404 671 | 638 278 574 194 307 
5 617 377 257 650 | 614 | 229 457 177 218 
6 525 276 169 603 612 190 309 127 161 
T 436 201 113 544 | 557 146 213 080 113 
8 353 159 080 445 | 476 102 152 072 059 
9 281 093 054 351 392 072 126 031 023 
10 227 053 016 287 304 033 066 —017 —025 
11 | 201 —014 —026 233 266 —012 034 —058 —116 
12 174 —090 —099 155 219 —070 001 —085 —179 
13 | 117 | —177 —116 086 144 —116 —046 —143 —232 
14 068 —258 —110 015 083 —172 —091 —216 —283 
15 005 —316 —162 —036 031 —252 —159 —271 —350 
16 —045 —361 —260 —096 —006 —318 —218 —305 —374 
17 —110 —404 —276 —120 —059 —362 —253 —357 —388 
18 —179 —419 —271 —137 —063 —416 —270 —425 —397 
19 —259 —438 —234 —192 —049 —471 —276 —474 —412 
20 —323 —462 —330 —279 —094 —516 —291 —493 —429 
21 —381 —459 —382 —345 —159 —535 —358 —501 —482 
22 —434 —535 —414 —391 —214 —545 —411 —497 —493 
23 —470 —548 —401 —431 —259 —538 —448 —507 —485 
24 —467 —531 —406 —485 —321 —517 —459 —465 —444 
25 —473 —476 —397 —501 —345 —476 —449 —457 —407 
$ 
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$5 
xj 
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Fig. 50.3—Amplitude of cross-spectra of the three series of Table 50.2 
The ordinate is the logarithm to base 10 of the cross-spectral density divided by the 


square root of the product of the variances of the corresponding series. 
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expect that, just as in multivariate theory the dispersion determinant takes over the 
role of the variance, so in multivariate time-series adequate tests will be based on 
autocovariance or autocorrelation determinants. The exact sampling theory of such 
quantities has not been developed and we must be content, in the present state of 
knowledge, with somewhat imprecise, though intuitively reasonable, procedures. 


50.34 Let us begin, then, with a consideration of the covariance determinants 
| Ysl- If the scheme is one of moving averages, all such determinants vanish from 
some value of s, say /, onwards. If it is autoregressive, there are relations between 
the successive matrices y. For example, if the scheme is of Markoff type, 


lyel = Bl yol (50.113) 
(cf. (47.72)) where 
В = —| А, IA, (50.114) 
and if it is of the Yule type (cf. Example 47.8), 
Yen. Ye |a d Yı Yo (50.115) 
У Ya Yo A 
where В = —|A,|/| 4%|. (50.116) 


Unfortunately these relations do not work well in practice because of the high degree 
of sampling variation which obscures the true facts. 

For example, with the series of Example 50.7 the values of | c, |, and those of | y, | 
for s = 0,...5, are 


s [71 IL 

0 5.225 x 1010 3832 x 1010 
1 1-411 x 10% 8-240 x 10° 
2 3:809 x 10° 3-481 x 10° 
3 1-028 x 10° 2.633 x 109 
4 2:777 x 108 2:578 x 10° 
5 7497 x 107 9-175 x 10% 


The values fluctuate too much to provide a very clear guide. 


50.35 А further possibility is to consider the ratios y,y;!,. For instance, with a 
Markoff scheme we should expect the sequence of the determinants of such values to 
diminish steadily, according to (50.113). It seems, however, that they fluctuate 
considerably. 

For some further work in this field reference may be made to Bartlett and Rajalaksh- 
man (1953) and the monograph by Quenouille (1957), who generalizes the test of 50.9 
to the multivariate case. 


Systems of equations 

50.36 In constructing a mathematical model of a system we are usually led to a 
specification in terms of a set of relations among various kinds of quantities. For 
simplicity we shall suppose that these relations are all equations (and not, for example, 
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inequalities) and that they are linear. In practice this latter condition is not so restrictive 
as it might appear; sometimes we can get rid of curvilinearities by variate transforma- 
tions, sometimes a curvilinear relation can be replaced by linear ones in the way that a 
curve may be replaced approximately by segments of straight lines. 


50.37 Outside of the physical sciences, exact mathematical relations of a deter- 
ministic kind are rare. In the typical situation we have linear relations among variables 
which are inexact in the sense that error terms are present. For example, consider 
the simple assumed relation between two observed variables у and х 

у = Вх. (50.117) 
This may be inexact for at least three reasons: (1) the relationship between у and х 
is not linear; (2) the observed variables are subject to errors of observation, in which 
case the true relationship applies to unobservable variables 7 and &; (3) the relation 
is exact as far as it goes, but there are other variables also influencing y and the correct 
relation is 

у = Вх+е, (50.118) 
where e, at this stage, merely stands for something unknown which we cannot specify 
more explicitly. 


50.38 Equation (50.118) is a structural relation among variables which are not 
necessarily stochastic. It is not a regression equation. However, when faced with 
such relations in practice it is not unreasonable to postulate that ғ behaves like a random 
variable, and to depart from that assumption only when evidence about the actual 
behaviour of е is accumulated. We shall, moreover, assume that variables y and x 
are not subject to errors of observation. 

We are thus led to consider systems of equations of linear type which do not in- 
corporate errors of observation but do incorporate a stochastic element. Our object 
is to use the observations to estimate the constants in these equations and the variances 
of the stochastic terms. We have already considered some systems of the kind: 
(a) regressions with independent errors, (b) autoregressions with independent errors, and 
(c) autoregressions with moving-average errors. We proceed to consider briefly two 
other types: (d) regressions with autocorrelated errors, and (c) mixed regressive- 
autoregressive systems. 


Regression with autocorrelated errors 

50.39 This case appears to have been first discussed in any detail by Cochrane 
and Oreutt (1949), who pointed out that least-squares estimation was not free from bias 
when the error terms were correlated. A test for the existence of such correlation 
was provided by Durbin and Watson (1950-1). Exact results are difficult to obtain, 
but Durbin and Watson set up a test statistic which, in effect, falls between two other 
statistics, each of which follows R. L. Anderson's distribution (48.8). See also Watson 
(1955) and Watson and Hannan (1956). 


50.40 Consider a regression of y on fixed x's, 
Jı = By ob utu t= 132,5, 5 (50.119) 
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where и, is written instead of the usual e, to denote that it is autocorrelated. We 
may often, without serious error, represent the autocorrelation structure of и by assum- 
ing it to be autoregressive: 


k 
È @ущ—у = 8. (50.120) 
0 
If the «’s were known we could transform (50.119) to 
k TOUR k 
> = У ВУ 5 
P267 ci Эр vue Xt ms Uis (50.121) 
namely to 
yi =È ots, (50.122) 
1 
k 
where N= È 43у, (50.123) 
j=0 
k 
a E ЕЛЕН (50.124) 
=0 


Equation (50.122) is now an ordinary regression. Cochrane and Orcutt (1949), to 
whom this so-called “ autoregressive transformation” is due, suggest guessing values 
of a, estimating from (50.122), and iterating the process if necessary by recalculating 
residuals and finding a further approximation to the a’s. 
Durbin (1960b) has proposed an alternative procedure which yields asymptotically 
efficient estimators. Writing yy = fæ; we put (50.121) in the form 
k 


+ È ву eg = È guste (50.125) 


If the y’s were independent we could, as indicated below, regard this asymptotically 
as a regression of y; on the other y’s and the x’s, and derive least-squares estimators of 
a and y. If the corresponding estimators of «, f, y are a, b, c we have, in virtue of 
(50.119), 


k 
yc z U Yt-4— X Cy Hp, ау = у.+Хазш 4 У(си–аВ)х, c. (50.126) 
dis 


Hence the a’s and (c—af)’s are least-squares coefficients of regression on wu, , and 
Xu; Consequently the quantities a;—a, and c,,—a,8; are asymptotically normal 
with zero means and ascertainable dispersion matrix. We can therefore write down 
their likelihood and maximize it to obtain estimators of « and f. 

In certain cases the least-squares estimates derived simply from (50.119) are asympto- 


tically efficient—cf. Grenander and Rosenblatt (1957) and В. L. and Т. W. Anderson 
(1950)—but tests of hypotheses are impaired. 


Mixed autoregressive-regressive systems 
50.41 Consider now the case where an autoregressive set of y’s is regressed on 


fixed x's: 
k [4 
E Veg = E Bixud e (50.127) 
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We can express this in a form similar to (50.126): 
k [4 
5 == OY pg + трин (50.128) 


However, this is not a regression with fixed variables оп the right, owing to the appear- 
ance of the lagged y’s. Durbin showed (1960a) that asymptotically the properties of 
least-squares estimators in such a system are the same as those without lagged variables, 
whether or not the residuals are normally distributed. This is a natural extension of 
the Mann-Wald theorem mentioned in 50.7. 


50.42 We shall not have the space to develop any further the theory of estimation 
and testing in statistical models, a subject of major importance which is full of pitfalls. 
Some general comments may, however, be useful. 


(a) It is important to remember which variables are being treated as “ fixed”? and 
which are, by their own nature or by the way in which the model is written, 
stochastic. This is particularly true when equations in these variables are being 
manipulated. For example, if we denote the random variable by a lower-case letter 
and a fixed variable by a capital, the regression 


у= ВХ+е (50.129) 
is not the same thing as 
mee 
x-2Y-ze. 50.130 
oh ( ) 


(b) The point becomes of particular interest in time-series wherein the same variable 

ш, may occur in lagged form u-n шз, etc. In the equation 

Uy = pui a£, (50.131) 
we should usually regard both и, and v, as random variables. However, at time f, 
u, has already occurred and is known. It is thus not random in one sense; for 
example, if (50.131) is regarded as a predictive equation, we are interested in the 
conditional variable и, | ш,_1, not the joint distribution of и, and шу. 

(c) It will be clear, and was forcibly brought to notice by Haavelmo (1943), that 
estimation of the constants in a subset of equations, instead of the whole set, may 
result in bias. ‘Thus there is always a further source of error in estimation which 
must not be forgotten—we may have omitted part of the model. 

(d) The nature of the data available sometimes leads to the specification of incorrect 
models. For example, the demand for a commodity influences its price, and its 
price influences supply. But to write 

d, =) 

bi = als.) (50.132) 
overlooks a fundamental property of the system, in that there may be a lag before 
a change in one variable affects the other. The lag may be so short that its effect 
does not appear in any statistical evidence we are able to collect; but to ignore it is 
to destroy the utility of the model. 


50.43 А Scandinavian school led by Wold (1964) has insisted on confining economic 
models to what is known as the “ causal-chain ” approach, and much of what they have 
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to say is relevant to the general problem of analysing dynamic systems. The pheno- 
menon under study is conceived of as a chain of causation. A behaviour variable 
(observable) is subject to causal influences specified by a number of explanatory variables 
and is influenced by other behaviour variables only through the explanatory set. 
Theoretically, perhaps, relations expressing the dependence of behaviour variables on 
explanatory ones should be lagged in time. But when this is not possible the equations 
are to be regarded as asymmetrical and read from left to right, e.g. the dependence of 
price on demand, say the simple linear equation 

р = ad, (50.133) 


із поб invertible to give pue 1. (50.134) 


The literature on model building is scattered, inadequate, and incomplete. That 
on the statistical analysis of models is worse. A monograph by Fisk (1967) gives a 
useful account of problems associated with sets of equations. For the causal-chain 
method see the collection of papers edited by Wold (1964). See also the collection of 
papers due to C-E-I-R (1968). 


Forecasting 

50.44 One of the main objects of time-series analysis is to be able to predict the 
behaviour of the system under study over some future period of time; or, at least, to 
be able to see whether prediction within acceptable limits of error is possible. 

Two approaches to the problem are available. In the first we adopt a purely 
statistical approach: the past behaviour of a series is studied, and on the assumption 
that the generating system is constant an attempt is made to project the series into 
the future without a detailed study of the generating system itself. Thus, given an 
autoregressive series and having estimated its constants, we may write, for example, 

щ = — Gy Uy_y— Sgt, 3+8, (50.135) 
and estimate ш, (a) by substituting the known values of и, and wu, in this equation, 
and (b) by assuming that the best estimate we can make of the disturbance term e is 
to equate it to zero. If we have, from previous experience, estimates of the variances 
of є and of our estimators of о; and хз, we may put confidence intervals round the 
estimate of u, 


50.45 This frankly empirical approach is based on the assumptions (a) that the 
system is such that an autoregressive scheme (or some other chosen scheme) is a good 
approximation to the effect of the true generating mechanism, and (b) that such mech- 
anism is not changing, or at any rate not changing rapidly enough to impair the sup- 
position that we may use the equation based on past experience to represent its behaviour 
in the future. If, however, we wish to delve more deeply into the nature of the gener- 
ator, we must set up a model; that is to say we must try to write down in specific form 
the relationships which condition the motion of the system. This is a more complicated 
exercise, involving on the one hand a much greater insight into the causal mechanisms at 
work, and on the other hand a lot more effort in estimating the various quantities 
involved. The tendency has been for statisticians to prefer the simpler approach and 


TIME-SERIES: SOME FURTHER TOPICS 501 


to extrapolate from past experience without attempting to set up a model. This may 
well be the more rewarding approach for prediction in the short term. But it does not 
enable us to predict what would happen if we altered the system. 


50.46 If it has been found that an autoregressive scheme or a scheme of regression 
satisfactorily fits past experience, there remains little to be said about the forecasting 
problem. We merely use the authenticated relationship to predict future values. This 
can be done for any form of time-series. If it has been decomposed into elements 
such as trend, seasonal, and oscillatory series, we predict the future of each element 
and reassemble them to forecast the future of the original series. As we remarked at 
an earlier stage, the underlying supposition is that the various elements are causally 
independent. 


50.47 In practice it is often found that schemes of order two are as satisfactory 
as such schemes can be, i.e. little is gained by adding extra terms. In fact, a good 
deal of attention has been given to the case where the scheme is of order one, namely 
is a Markoff scheme. The prediction equation is then very simple but possibly too 
simple. A heuristic approach suggested by Holt (1957) has some attractive features. 

We consider a scheme of autoregressive type, 

шм = «щ+(1—)щу+(1—«)%щ „+... +a(l—o)*uy ater (50.136) 
Considered as a predictor this has a certain intuitive appeal if | 1—« | < 1, for then 
the terms contribute less and less to и; as we go back in time. If we estimate u, 41 
by the systematic component of (50.136), i.e. ignore £, we have 

k 
Est u, 4 — Est uj = ak (1-a) ya (1.72) ga 

= o(u,— Est uj) - (1 —2)*** u, i. (50.137) 

For | 1—«| not too close to unity and moderately large k we may write 

Estu,,, — Est u; = a(u, — Est uj) 

= a£. (50.138) 
Suppose ж known. At any time-point t+-1 we know є,; it was the error of estimate at 
time 2. Thus we simply estimate и, у by taking the estimate at time £ and adding ae,. 


50.48 The estimation of the parameter « is not a simple matter. The most 
straightforward approach, given enough computational assistance, is to try a range of 
values of « and to calculate the sum 

E {unat ... -a(1—2)'u а)? 
for different values of А, selecting the values of « and k which minimize it. In practice 
it seems that one does not need great precision in the exact determination of optimal a. 

Systems of type (50.136) are known, for obvious reasons, as exponentially weighted 
moving-average predictors. They have been studied in more detail by Brown (1959), 
Barnard (1959), Cox (1961), Box and Jenkins (1962), and Ward (1963). Winters (1960) 
proposed an extension which includes seasonal movements. 


50.49 At this point we must end, realizing that there are some branches of the 
subject which might have been discussed at greater length and many more which 
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remain for future development. The reader who has stayed the course thus far will, 
we hope, be willing to make allowances for the shortcomings of our work. A subject 
which is growing as rapidly as ours does not easily shake down into a coherent structure, 
Nor is this a matter for regret, in that so many of the changes in emphasis are due to 
growth and an abundant vitality which will undoubtedly carry our subject to further 
triumphs. We acknowledge our great indebtedness to the writers on whose work we 
have so freely drawn; we apologize for our errors and omissions; and we write these 
final words with a considerable sense of relief. 


BIBLIOGRAPHICAL NOTE 


A comprehensive Bibliography of Statistical Literature by M. G. Kendall and Alison G. Doig 
(Oliver and Boyd, Edinburgh), covering about 30,000 items from the sixteenth century up to 1958, 
is now available. Volume 1 (1962) covers the years 1950-58, Volume 2 (1965) the years 1940- 
49, and Volume 3 (1966) the years ир to 1939. From 1959 onwards reference should be made to 
the Journal of Statistical Abstracts, published for the International Statistical Institute periodically. 

There are also in existence a number of specialized bibliographies, including a particularly 
fine one issued under the editorship of H. Wold (1965). 


EXERCISES 


50.1 If V, is the determinant of the matrix of (50.12), show that 
Ил = (0-89) Va-1—B*Vn-2 


Vn = (1—88) /(1— 89). 


and hence that 


50.2 In the notation of 50.9 show that, for a Markoff scheme, 
Фу = 15+2prj-1+prj-2, 
and hence that such a scheme is inadequate to represent the series of Example 50.2. 


50.3 For у of equation (50.26) show that if G(z) is the autocovariance function of e, that of 
ah © © 
X a i) = =) Ge) 
=- 0 --ә 


and hence that є and 7 have the same autocovariance generating function, 
50.4 Verify equation (50.32). 


50.5 (Progressive solution of the Yule-Walker equations (47.66). A linear autoregressive 
scheme is of order k. If the coefficients « are calculated on the assumption that it is of order 
S < k, giving 25, 259 . . . , капа жу = —p,, show that 

Wat = 0, 1, t 6304-1, s-t t= 1, 2,..., 5—1, 
_Ps+%e—1,1Pe—-1+%—2,2 ре-9 ... FO 1-101 
1+o9—1,1Pit+ o... +%—1,,—-1р—-1 
Fal, Zoos A 
(Durbin, 1960b) 


te = , 
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50.6 In the notation of 50.10 show that if the likelihood is written as (1—6*)~* f(Q) da, 
so that 


Í f(Q) da = (1—83), 


5%) = O(n-?) 


ав 
Оо CO 
5(°0) = (29) +O(n-). 


Hence show that approximately 
b = E| 20 У 
varb = E\ 3p 
and derive equation (50.48). (Durbin, 1959b) 
50.7 Verify equation (50.88). 


50.8 If two series of linear autoregressive or moving-average type are generated from the 
same series of random elements, show that for all А 


о z 
X panes; E разр: 
$--—o {=-© 


(Quenouille, 1957) 


50.9 In Example 50.6, if the matrices F are the A-matrices of an autoregressive scheme, 
show that only one determines a process which is stationary. 


50.10 Generally in 50.26, by considering the case where B is the identity matrix, discuss 
the conditions under which an autoregressive scheme has an identifiable stationary solution. 


Envoi to Volume 3 


“Before your going down at the end of the Parliament, I 
thought good to deliver unto you certain notes for your observa- 
tion, that serve aptly for the present time, to be imported after- 
wards when you shall come abroad... . 

“ Yourselves can witness that I never entered into the examina- 
tion of any cause without advisement, carrying ever a single 
eye to justice and truth; for, though I were content to hear matters 
argued and debated pro and contra, as all princes must that will 
understand what is right, yet I look ever as it were upon a plain 
table wherein is written neither partiality nor prejudice.” 


ELIZABETH I, to her last Parliament 


APPENDIX TABLES 


1 The frequency function of the normal distribution 
The distribution function of the normal distribution 
3 Quantiles of the d.f. of z? 
4a The distribution function of 7? for one degree of freedom, 0<y?<1 
4b The distribution function of 7? for one degree of freedom, 1<y?<10 
5 Quantiles of the d.f. of t 
6 5 per cent. points of z 
7 5 per cent. points of F 
8 1 per cent. points of z 
9 1 per cent. points of F 


10 Symmetric functions. Augmented symmetrics in terms of power-sums and 
vice versa 
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Appendix Table 1 Frequency function of the normal distribution y = Tox 7 with 
first and second differences 
x A'(—) | a’ x z a (=) | at 
oo 0:39894 | 199 | —392 25 001753 395 | +79 
от 039695 | 591 — 374 2:6 001358 316 | +66 
o2 0'39104 | 965 —347 27 001042 250 | +53 
o3 038139 1312 —308 28 000792 197 | +45 
o4 036827 1620 |  —265 29 000595 152 | +36 
| | | 
o5 | 0835207 1885 —212 30 0700443 116 | +27 
o6 0:33322 2097 —159 31 000327 89 +23 
07 031225 2256 —104 32 0:00238 66 | +17 
o8 0:28969 2360 = 52 3:3 0'00172 49 | +132 
o9 026609 | 2412 o 34 000123 36: |- “Fro 
| 
| 
ro 024197 2412 + 46 35 0:00087 26 +7 
тї 021785 2366 + 84 36 о-оооб1 19 + 6 
r2 019419 2282 +118 37 000042 13 +4 
r3 017137 2164 +143 38 000029 9 +2 
14 014973 2021 +161 39 000020 Wee ELEC 
rs 012952 1860 +173 40 000013 4 — 
r6 O'11092 | 1687 +177 41 000009 3 — 
17 0'09405 |  I510 +177 42 000006 2 — 
r8 0:07895 | 1333 +170 43 0'00004 5 — 
1'9 0:06562 1163 +162 44 0'00002 Бы — 
zo 005399 | тоот +150 45 000002 — — 
21 0'04398 851 +137 46 о'ооооІ — — 
22 0°03547 714 +120 47 o-00001 — — 
2:3 002833 594 | + 108 48 000000 — — 
24 002239 | 486 | + 91 
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Appendix Table 2 Distribution function of the normal distribution 
"The table shows the area under the curve у = (22)-1e-17 lying to the left of specified deviates x ; 
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e.g. the area corresponding to a deviate 1-86 (= r5 + 0:36) is 0:9686. 


Deviate oo + os + rod 15 + зо+ | 25+ | зо + 35+ 
ооо 5000 6915 8413 9332 9772 | 97379 9'865 9°77 
осот 5040 6950 8438 9345 9778 9*396 92869 9°78 
ооз 5080 6985 8461 9357 9783 9°413 9874 9°78 
ооз 5120 7019 8485 9370 9788 9'430 | 91878 9°79 
oo4 | 5160 7054 8508 9382 9793 9°, 92882 9°80 
oos 5199 7088 8531 9394 9798 9461 9886 9°81 
ооб 5239 7123 8554 9406 98оз 9°477 | 9:889 | 9°81 
007 5279 7157 8577 9418 9808 97492 | 9'893 9°82 
0-08 5319 | 7190 8599 9429 9812 9%506 | 92897 9°83 
009 5359 | 7224 8621 9441 9817 9520 92900 9°83 
оло 5398 7257 | 8643 9452 9821 9534 | 9%03 9°84 
очі 5438 7291 | 8665 9463 9826 92547 | 9206 985 
оз? 5478 | 7324 | 8686 9474 9830 9°560 9810 9°85 
o3 5517 | 7357 | 8708 9484 9834 95573 9°13 9°86 
org 5557 | 7389 8729 9495 9838 9*585 9°16 9°86 
os 5596 7422 | 8749 9505 9842 | 9'598 | 9*8 9°87 
016 5636 7454 | 8770 9515 9846 9*6og | 9°21 9°87 
017 5675 7486 8790 9525 9850 9621 9°24 9°88 
o1i8 5714 7517 | 8810 9535 9854 9632 9°26 9°88 
0-19 5753 7549 | 8830 9545 9857 | 9643 9°29 9°89 
ото 5793 7580 8849 9554 9861 | 97653 | 9°31 9°89 
о"21 5832 | 7611 8869 9564 9864 | 92664 9°34 9°90 
022 5871 | 7642 8888 9573 9868 92674 9°36 9°90 
023 5910 | 7673 | 8907 9582 9871 92683 | 9738 gto4 
024 5948 | 7704 | 8925 9591 9875 | 92693 | 9840 9408 
0-25 5987 7738 | 8944 9599 9878 92702 942 | 92 
0-26 6026 | 7764 8962 9608 9881 9°711 9°44 9*15 
0:27 6064 | 7794 | 8980 9616 9884 9*720 9°46 9118 
0-28 6103 7823 8997 9625 9887 | 92728 9°48 9*22 
0:29 6141 | 7852 9015 9633 9890 92736 9850 9'25 
030 6179 7881 9032 9641 9893 97744 9°52 9*28 
озі 6217 7910 | 9049 9649 9896 9°752 9°53 9*31 
032 6255 7939 | 9066 9656 9898 9760 | 9955 9433 
0533 6293 7967 9082 9664 9901 9:767 | 9°57 | 936 
034 6331 7995 | 9099 9671 9904 9274 | 9858 9439 
0°35 6368 8023 | 911$ 9678 9906 | 92781 обо | 94r 
0°36 6406 Бот | 9131 9686 9909 | 92788 | 9°61 943 
037 6443 8078 9147 9693 9911 | 92795 9262 946 
0:38 6480 8106 | 9162 9699 9913 9*801 964 948 
0:39 6517 8133 9177 9706 9916 9807 9°65 9450 
0'40 6554 8159 | 9192 9713 9918 9°813 9°66 9*52 
O'41 6591 | 8186 9207 9719 9920 92819 | 9°68 9154 
0'42 6628 8212 | 9222 9726 9922 9825 9269 9*56 
043 6664 8238 9236 9732 9925 9831 9°70 9458 
0-44 | 6700 8264 9251 9738 9927 92836 9°71 9459 
o45 | 6736 8289 9265 9744 9929 9284г 9°72 9%т 
046 6772 | 8315 | 9279 9750 9931 | 9846 | 9°73 9163 
047 6808 8340 9292 9756 9932 9°851 | 9°74 9*64 
048 6844 8365 9396 9761 9934 9856 9°75 966 
0:49 6879 | 8389 9319 9767 9936 9°861 | 9°76 9167 


by powers, e.g. 9°71 stands for 0:99971. 


Note—Decimal points in the body of the table are omitted. Repeated 9's are indicated 
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Appendix Table 4a Distribution function of x? for one degree of freedom for values 


X? = 0 to х? = 1 by steps of 0-01 


a P A z - ^ 

o 1:00000 7966 o'5o 047950 436 
oor 092034 3280 о'51 047514 430 
0-02 0:88754 2505 0-52 0°47084 423 
0°03 0:86249 2101 0:53 0:46661 418 
O04 0:84148 1842 0°54 0:46243 41I 
O05 082306 1656 0:55 0°45832 406 
0-06 080650 1516 0°56 0:45426 400 
©о7 079134 1404 057 045026 395 
o'o8 077730 1312 o'58 0:44631 389 
009 0776418 1235 0:59 044242 384 
оло 0°75183 1169 о'бо 043858 379 
Orr 0774014 тїї obr 043479 374 
o2 072903 тобо 0:62 043105 369 
013 0:71843 1015 0°63 0:42736 365 
O14 0:70828 974 0:64 042371 360 
O15 069854 938 0°65 042011 355 
o16 068916 905 0°66 041656 351 
017 oborr 874 0°67 041305 346 
o18 0:67137 845 0:68 040959 343 
o'19 0:66292 820 obg 040616 338 
020 0:65472 795 o7o 040278 334 
o2t 064677 773 0o71 039944 330 
022 063904 752 072 0:39614 326 
023 063152 731 0:73 0:39288 322 
024 062421 713 0'74 0:38966 318 
o25 0:61708 696 075 038648 315 
o26 061012 679 0:76 0:38333 311 
027 0:60333 663 077 0:38022 308 
o:28 0:59670 648 0:78 037714 304 
0:29 059022 634 079 0737410 301 
0:30 0:58388 620 о'8о 037109 297 
031 057768 607 о'81 0'36812 294 
0:32 o'57161 595 o82 0:36518 291 
0:33 o'56566 583 o:83 0:36227 287 
034 055983 572 0:84 0:35940 285 
0'35 O'55411 560 o'85 0:35655 281 
0°36 0°54851 551 0°86 0°35374 278 
0°37 054300 540 0°87 035096 276 
0:38 053760 530 o:88 0'34820 272 
0:39 053230 521 089 034548 270 
040 0:52709 512 0°90 0"34278 267 
O41 0:52197 503 о'91 034011 264 
042 051694 495 0:92 033747 261 
043 o'51199 487 0:93 033486 258 
044 0:50712 479 0'94 033228 256 
045 050233 471 0'95 0'32972 253 
046 049762 463 096 032719 251 
047 049299 457 0:97 032468 248 
0:48 048842 449 0:98 032220 246 
0'49 048393 443 0'99 031974 243 
0:50 047950 436 гоо 031731 241 
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Appendix Table 4b Distribution function of x? for one degree of freedom for values 
of x? from 1 to 10 by steps of 0-1 


La P 4 XE Р А 


то 0°31731 2304 5'5 001902 106 
rr 029427 2095 56 001796 99 
r2 027332 191І 57 001697 94. 
r3 0'25421 1749 5:8 о'отбоз 89 
14 0:23672 1605 59 O'OI514 83 
rs 0:22067 1477 бо O°01431 79 
r6 0'20590 1361 6*1 001352 74 
17 019229 1258 62 001278 7 
r8 017971 1163 6:3 0'01207 66 
rg 0°16808 1078 64 O'O1141 62 
20 015730 1000 65 001079 59 
21 0°14730 929 66 0'01020 56 
22 0'13801 864 6:7 0'00964 52 
2:3 012937 803 68 0000912 50 
24 0"12134 749 6:9 000862 47 
25 o'11385 699 то 0:00815 44 
2:6 o'10686 651 71 0'00771 42 
27 0:10035 609 7? 000729 39 
28 009426 568 T3 000690 38 
29 008858 532 74 000652 35 
зо 0:08326 497 T5 0'00617 33 
3 007829 465 76 0°00584 32 
32 007364. 436 vigi 000552 30 
33 006928 408 78 000522 28 
34 0'06520 383 T9 000494. 26 
3'5 0'06137 359 8:0 0'00468 25 
3:6 005778 337 81 000443 24 
37 005441 316 82 000419 23 
3:8 0'05125 296 83 0:00396 21 
3'9 004829 279 84 000375 20 
40 0'04550 262 8:5 0'00355 19 
41 0'04288 246 8:6 0:00336 18 
42 004042 231 8-7 0700318 17 
43 003811 217 88 0°00301 16 
44 003594 205 8:9 000285 15 
45 0°03389 192 g'o 000270 14 
46 003197 181 or 0'00256 14 
47 0:03016 170 92 0700242 13 
48 0'02846 160 93 0:00229 12 
4'9 002686 ISI 9'4 0'00217 12 
5'0 002535 142 9'5 000205 10 
5 0:02393 134 96 0'00195 II 


52 0°02259 126 97 0'00184 то 
5'3 0702133 119 98 0:00174 9 
54 002014 112 99 0'00165 8 
55 001902 106 10'0 000157 8 
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Appendix Table 6 5 per cent. points of the distribution of z 
(values at which the d.f. = 0-95) 


(Reprinted from Table VI of Sir Ronald Fisher’s Statistical Methods for Research Workers, 
Oliver and Boyd Ltd., Edinburgh, by kind permission of the author and publishers) 


Values of v, 


pied A es ed Ts LS ae 24 | œ 


2:5421 2:6479 | 2:6870 | 2/7071 | 277194 | 277276 | 2/7380 2-7484 | 2-7588 | 2°7693 
14592 14722 | 1:4765 | 14787 | 14800 | 1:4808 | 14819 | 1:4830 | 14840 14851 
1:1577 11284 | 11137 | 11051 | 10994 | 170953 | 10899 | 1*0842 | 1:0781 | 1'0716 
1'0212 |0:9690 | 09429 | 0:9272 | 0-9168 | 09093 | 0:8993 | €'8885 | 0:8767 | 0:8639 
0°9441 | 0:8777 | o:8441 | 0°8236 | 0:8097 | 077997 | 07862 | 07714 | о:7550 07368 
0:8948 | 0:8188 | 077798 | 077558 07394 | 077274 | o'7112 | o*6931 | 96729 | 0:6499 
0:8606 | 077777 | 0°7347 | 0/7080 о0:6896 | 06761 | 0:6576 | 06369 | 06134 | o:5862 
08355 07475 | o'7014 | 0:6725 06525 06378 | 06175 | 0*5945 | 05682 | 05371 
0°8163 | 0°7242 | 0°6757 | 0°6450 06238 о-бо8о | o:5862 0°5613 | 0:5324 | 0:4979 
| о`8о12 | o:7058 06553 0"6232 0-6009 | 0:5843 | o-5611 0'5346 | 0:5035 | 04657 
| 


[I 


| | 
11 | 0°7889 | o'6909 | 06387 0°6055 о:5822 | 0"5648 | 05406 05126 | 0:4795 0:4387 

| 07788 0:6786 |o-6250 | 05907 | 0°5666 | 05487 | 0:5234 | 04941 | 04592 | 04156 
13 (077703 06682 06134 | 0°5783 05535 0"5350 | 05089 | 04785 
14 077630 06594 0:6036 0:5677 05423 | 0°5233 | 04964. |04640 | 0*4269 | 0:3782 
15 077568 |o'6518 | 05950 | 0°5585 | 0'5326 | o'5131 | 04855 | 04532 | 0:4138 | 03628 
16 07514 06451 | 0:5876 | о"5505 o'5241 |0:5042 | 04760 | 04428 | 0:4022 | 0°3490 
17 07466 06393 o'5811 0'5434 075166 | 04964. | 0:4676 | 04337 | 03919 | 0°3366 
07424 06341 | 0'5753 | 05371 0:5090 | 0:4894 | 04602 | 04255 | 03827 | 03253 
19 |0"7386 | 0-6295 |o:5701 055315 | 0:5040 |0'4832 | 04535 | 0'4182 0"3743 | O'3151 
20 | 0°7352 | 0:6254 | 05654 |0:5265 0:4986 |0:4776 | 0:4474 | O'4116 | 0:3668 | 0:3057 


04419 | 073957 


Values of >, 
= 


| | 
21 | 0°7322 0'6216 | O'5612 0°5219 о°4938 0'4725 | 0°4420 | 0'4055 | 03599 02971 
22 07294 06182 | 0'5574 | 0°5178 | 0:4894 04679 | 0:4370 | 04001 | 0'3536 | o:2892 
23 07269 | o'6151 | o'5540 | 05140 04854 04636 |0'4325 | 03950 | 073478 | 02818 
24 077246 | 0°6123 | o:5508 075106 0:4817 04598 | 04283 | 03904 | 0'3425 | 0'2749 
25  0"7225 | 0°6097 | 0°5478 | 05074 04783 | 0:4562 | 04244 | 03862 | 0:3376 | o-2685 
26 07205 |0:6073 | 05451 | 0'5045 | 04752 | 04529 |0'4209 0'3823 | 03330 | 02625 
27 0°7187 | o-6051 |0'5427 | 05017 | 04723 | 04499 | 04176 | 03786 | 0:3287 | o'2569 
28 0°7171 0:6030 | 0:5403 | 04992 | 04696 | 04471 | 04146 | 0:3752 | 03248 
29 ©7155 | o'6oii | o:5382 |0'4969 | 0:4671 0°4444 | 0'4117 | 03720 | 03211 | 02466 
| 30 07141 0°5994 |0'5362 |0:4947 | 04648 04420 0:4090 | 03691 | 03176 


о 
n 
л 
ev 


o 
N 
È 
© 


| | 
бо | 06933 0:5738 | 05073 | 0°4632 о-4311 | 04064 0:3702 0°3255 | 0'2654 | 01644 


| 
| œ | 0°6729 | 05486 | 0:4787 | 04319 | 03974 | 03706 о-3309 о-28о4 | 0:2085 | о 
| | | | | 
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Appendix Table 7 5 per cent. points of the variance ratio F 
(values at which the d.f. = 0°95) 
(Reproduced from Sir Ronald Fisher and Dr F. Yates: Statistical Tables for Biological, 


Medical and Agricultural Research, Oliver and Boyd Ltd., Edinburgh, by kind permission 
of the authors and publishers) 


1 2 3 4 5 6 8 12 24 © 


| 161-40 199730 215°70 224°60 23020 23400 238:90 24390 249'00 254:30 
18$1 19:00 19:16 192$ 19:30 19:33 1937 1941 1945 19°$0 
1013 955 928 912 до 894 884 874 864 853 
771 694 659 639 626 616 боф зо 577 563 
66: 579 5 519 sos 495 482 468 453 436 


5°59 474 435 412 3°97 3:87 3°73 3°57 341 3'23 
53a 446 407 384 збо 358 344 328 312 293 
512 4°26 3:86 3°63 3°48 3°37 3°23 3:07 2:90 2:71 


I 
2 
3 
4; 
5 
6 599 514 476 453 439 428 415 400 384 367 
7 
8 
9 
о 496 "410 371 348 333 заа 307 291 474 2754 


| on 484 398 359 336 320 309 295 279 261 240 
12 475 3:88 3°49 3:26 зи 3°00 2:85 2:69 2:50 2:30 
13 4:67 3°80 3°41 318 302 2:92 277 2:60 2:42 2:21 
14 460 374 334 311 296 285 270 253 235 213 
15 4'54 3:68 329 3:06 2:90 2:79 2:64 2:48 2:29 2:07 


16 449 3°63 324 зот 2:85 2774 2:59 2:42 2:24 2:01 
17 445 3°59 320 2:96 281 270 255 2:38 2:19 106 
18 441 355 3°16 2:93 277 2:66 2:51 2'34 215 1:92 | 
19 4°38 3°52 3°13 2°90 274 2:63 2:48 2°31 211 1:88 
20 4°35 3°49 310 2:87; 2771 26 2°45 2:28 2:08 1:84 


ar | 433 37 3707 2:84 2:68 2°57 242 2:25 2:05 181 
22 | 430 344 3'05 2:82 2:66 2°55 2:40 2:23 2:03 178 
23 428 342 303 2:80 2:64 2:53 2:38 2:20 2:00 1:76 
24 4°26 3'40 3°01 2:78 2:62 2:51 2:36 218 1:98 1773 
25 424 33'8 2:99 2:76 2:60 2:49 2:34 216 1:96 171 


26 422 337 298 274 2 247 232 215 195  r69 
27 421 335 296 273 257 246 230 213 1:93 1:67 
28 420 3°34 2°95 2:771 2:56 2:44 2:29 2:12 1:91 1:65 
29 418 3:33 2:93 2770 2°54 2°43 2:28 2:10 1:90 164 
зо 417 3°32 2°92 2°69 2°53 242 2:27 2'09 1:89 1:62 


40 4:08 3°23 2:84 2:61 2'45 2:34 218 2:00 1:79 Ist 
60 4:00 315 2:76 252 237 225 2:10 1:92 170 1:39 
120 | 3°92 3'07 2:68 2°45 2:29 2:17 2:02 r83 1:61 1'25 


© 3°84 2:99 2:60 2:37 221 2:09 T» 1775 1'52 1'00 


Lower 5 per cent. points are found by interchange of », and эу, i.e. », must always correspond 
to the greater mean square. 
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Appendix Table 8 1 per cent. points of the distribution of = 


(values at which the d.f. 


0-99) 


(Reprinted from Table VI of Sir Ronald Fisher’s Statistical Methods for Research Workers, 
Oliver and Boyd Ltd., Edinburgh, by kind permission of the author and publishers) 


с Values of », 
| 1 | П I I f | 
бы; 2 3 | 4 5 6 8 12 24 © 
| 
1 | 4°1535 | 42585 42074 43175 43297 43379 43482 4°3585 43689 | 473794 
2 | 2'2950 | 2:2976 2°2984 |2:2988 2°2991 |2:2992 | 2:2994 | 272997 | 2:2999 2'3001 
3 157649 1-714о0 1-6915 | 16786 1-6703 | 16645 | 1:6569 | 16489 | 16404 | 16314 
4 | 1°5270 154452 | 1°4075 | 3856 | 13711 | 1:3609 | 13473 | 13327 | 13170 | 1:3000 
5 |1:3943 | 12929 | 2449 | 2164 | 1974 | 171838 | 1:1656 | 171457 | 11239 | 1:0997 
6 |r3103 1:955 | r'1401 | 1-1068 | 170843 | 1:0680 | 10460 | 1*0218 | 09948 | 0:9643 
7 | 12526 11281 | 106072 | 0300 | 10048 | 09864 | 0:9614 0:9335 | 09020 | 0:8658 
8 | 1:2106 | 1-0787 | r'o135 |0:9734 09459 0'9259 08983 | 0:8673 | o-8319 | 0/7904 
9 |1'1786 ro411 | 09724 09299 09006 | o'8791 0:8494 0°8157 | 07769 | 0/7305 
10 | 1°1535 | r'o114 | 09399 | o:8954 | 0:8646 | o:8419 | o:8104 | 07744 | 0/7324 | 06816 
| 
11 |1:1333 09874 09136 | 0:8674 |0:8354 0'8116 | 0'7785 | 0/7405 | 0:6058 | 06408 
12 |r1166 09677 | o:8919 | 08443 |o'81rr | 07864 077520 077122 | 06649 | o*6061 
13 |1:1027 | 0"9511 | 0°8737 | 08248 |0:7907 | 077652 | 07295 | 0°6882 | 06386 | o*5761 
„| 14 | 150909 | 0:0370 | o:8581 | o'8082 | 077732 | 07471 | 07103 | 06675 | o*6159 0'5500 
& | 15 | г'0807 | 09249 | 08448 | 07939 | 077582 | 077314 | 0:6937 | 06496 | o'5961 | o-5269 
© | 16 | ro719 | o:9144 | o:8331 | 077814 | 07450 | 07177 | 06791 | 0:6339 | 05786 | 05064 
8 | 17 | гобдт |o'9o51 | 0:8229 | 07705 | 07335 ©7057 | 0:6663 | o*6199 | 0:5630 | 0:4879 
4 18 | 10572 | 0°8970 | 0°8138 | 07607 | 0:7232 0°6950 | 06549 | 06075 | 05491 | 04712 
> |19 | T0511 | 08897 | 08057 0-7521 | 07140 о-6854 | 0:6447 | 05964 | 0:5366 о-45бо 
20 |1:0457 |0:8831 о-7985 |0'7443 0°7058 о-6768 |0:6355 0:5864 О°5253 | 0'4421 
| | | | 
21 | 10408 | 0:8772 | 0°7920 0°7372 | 0°6984 0°6690 | 06272 05773 | 05150 | 04294 
22 | 110363 | 0°8719 | 0°7860 о-7309 0°6916 | o-6620 | 06196 | o*5691 | o*5056 | 04176 
23 | 110322 | 0°8670 | 0°7806 07251 0°6855 0°6555 06127 0:5615 | 0:4969 | 04068 
24 | 10285 | 0°8626 | 0°7757 ©7197 | 0:6799 | 06496 | 0*6064 0°5545 | 0-4890 | 03967 
25 | 10251 08585 07712 07148 06747 06442 | 06006 0:5481 | o:4816 | 0°3872 
26 | 110220 | 0°8548 | 07670 о-7103 | 06699 0°6392 | 0:5952 | 0'5422 | 0°4748 | 0°3784 
27 | orgr | 0°8513 | 0°7631 |o'7062 06655 О'6346 о"5902 05367 | 0°4685 | o'3701 
28 | 10164 08481 07595 07023 | 06614 0°6303  0:5856 0:5316 | 0:4626 | 0°3624 
29 | r'0139 0°8451 | 0°7562 | 0°6987 |0:6576 о-6263 | o'5813 0:5269 04570 | 03550 
30 | 1'0116 0°8423 077531 | 06954 0°6540 0‹6226 0:5773  0:5224 |O'4519 | 0°3481 
бо | 09784 | o:8025 о-7086 |0:6472 | 0:6028 | 05687 | o:5189 0°4574 | 0"3746 | 0'2352 
| 
со |0:9462 | 077636 | o*6651 | 0:5999 | 05522 | O'5152 | 0'4604 | 0'3908 о-29013 о 
| 
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Appendix Table 9 1 per cent. points of the variance ratio F 
(values at which the d.f. = 0°99) 


(Reproduced from Sir Ronald Fisher and 


Medical and Agricultural Research, Oliver and Boyd Ltd., Edinburgh, by kind permission 


of the authors and publishers) 
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exl 
Ae! 1 2 3 4 5 6 8 12 
2 
1 4052 4999 5403 562$ 5764 5850 5981 6106 
2 | 9849 9900 9917 9925 9930 9933 9936 9942 
3 3412 30°81 2946 2871 28:24 2791 2749 27095 
4 21:20 1800 16:69 15°98 1552 15:21 14:80 14°37 
5 16:20 13:27 12:06 11:39 10:97 10:67 1027 9:89 
6 | 1374 1092 978 915 875 847 810 772 
7 | 1225 gss 845 785 746 719 684 647 
8 1126 8-65 7:59 yor 6:63 6:37 6:03 5:67 
9 10:56 8-02 6:99 6:42 6:06 5°80 5°47 SIL 
10 | 1004 756 655 599 564 539 506 471 
тї 965 7:20 622 567 532 507 474 440 
12 933 боз 595 541 506 482 450 416 
13 907 670 $74 S20 486 462 430 396 
14 $86 651 556 5:03 469 446 414 380 
15 868 636 542 489 456 432 400 3°67 
16 853 623 529 477 444 420 389 355 
17 840 бп 518 467 434 410 379 3°45 
18 8-28 6-01 5:09 458 425 фот 3°71 3°37 
19 818 бөз зо 450 417 3°94 363 330 
20 810 $85 494 443 410 387 356 323 
21 Воз 578 487 437 4:04 3%: 351 317 
22 | 794 572 482 431 3:99 376 345 312 
23 788 56 476 426 394 371 341 397 
24 782 5б 472 422 390 367 336 303 
25 777 557 468 418 386 363 332 299 
26 772 553 464 414 382 359 329 296 
27 | 768 549 460 411 378 356 326 293 
| 28 | 764 545 457 4907 375 à 353 заз 290 
| ao | 760 542 454 404 373 350 320 287 
| 30 | 756 539 45% 402 370 347 317 2°84 
| 4o 731 518 и 3983 351 329 299 266 
| 6o 7:08 4:98 413 3°65 3°34 312 2:82 2:50 
120 685 479 395 348 317 22 266 — 234 
© 6-64 460 3°78 3°32 3°02 2:80 2:51 218 


2775 
2:70 
2:66 
2:62 


2:58 
2'55 
2:52 
2:49 
2:47 


2:29 
212 
1:95 
1779 


1°80 
1:60 
1:38 
1:00 


Lower 1 per cent. points are found by interchange of v; and v, i.e. », must always correspond 
to the greater mean square. 
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Appendix Table 10 Augmented symmetric functions in terms of power-sums and 


vice versa 


(Reproduced from F. N. David and Kendall (1949) by kind permission of Prof. David and the 


editors of Biometrika) 


weight 1 3 G) = [т] 
weight 2 
= [2] _ 5] 
(2) 1 -1 
(D* | 1 X 
weight 3 UST SA —— ae 
{е m Ыы [кй my | 
(3) н -1 2 
(2) (1) 1 т -3 
(1)8 1 3 І 
weight 4 „АА. n Е 
my- ч] Bu а. [s fax") TID 
(4) | T -1 —1 2 -6 
(3) (1) 1 1 ы =2 8 
(2)* | I | т -1 3 
(а) (1)# | 1 a ME ty 1 e 
(OS І 4 n 3 6 т 
weight _ = SS 
М |. B) | ип bal | br | eg | m5 | n" 
(5) І —1 -1 2 2 —6 24 
Уз ЖЫ | i т á -2 EE 6 —39 
(3) (2) 1 . x -1ї -2 5 —20 
(0* | т 2 1 І р -3 20 
(2)? (1) | 1 1 2 1 -3 15 
(2) (1)° 1 3 4 3 3 X Ap Io 
(1)5 | x 5 10 10 15 10 z 
weight 6 _ _ 9 Pr mi We see 
[18 |11 | fed] | Lev BY [be | 0318) | E] |% | гәт | пе) 
бу x Е |x 2|-1 | 2 | -6 а -6 24 | —120 
(5) (1) H =k * EN 6 А 4 | 724 | 144 
(4) (2) | 1 x -1 -1 3-3 5 —18 90 
(4) (1)? | т 2 I т [4 -3 . ES 12 —90 
SIN IB Y . . 1 -1 2 . 2 -8| 4o 
(3) (2) (0). І £ I * 1 x -3 -4 20 | —120 
(з) (1) 1 3 3 3 1 JU EX x =a 46 
(2)* I m | el) mnes 5 WD т -1 з | -15 
W Dx a Ww | I 2 4 s I x —6 45 
(2) (1)* I Др une. 4 16 4 3 6 т —15 
Pao jati fel ala IS 10254. 20 lion 20 | 15 45 15 E 


To express the [ ] functions in terms of ( ), 
diagonal, e.g. [41°] = 2 (6) —2 (5) (1) — (4) (2) - (4) (1)°. 
of [ J, read across up to and including 


read downwards up to and including the main 
To express the ( ) functions in terms 
the main diagonal, e.g. (4) (1)* = [6] -- 2 [51] + [42] + [41°]. 
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Brewer, K. R. W., sampling with unequal 
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Brown, G. W., median tests, 108-9, (Exercises 
37.18-20) 117-18. 

Brown, R. G., exponential weighting, 501. 

Brunt, D., rainfall data, (Table 45.2) 343. 

Budne, T. A., random balance experiments, 
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Bulmer, M. G., confidence intervals in Model 
II AV, 71, (Exercise 36.15) 84. 

Burman, J. P., seasonal variation, 400. 

Buys-Ballot table, (Exercises 49.9-10) 470-1. 


Calitiski, T., analysis of groups of experiments, 
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Canonical, variables, 285-313; correlations, 
299-306, (Exercises 43.5-8, 43.10) 311-13; 
standard errors, 304-5; see Component 
analysis, Factor analysis, Latent roots. 

Carter, R. L., rotatable designs, 158. 
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Chacko, V. J., ordered alternatives in AV, 
49. 
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approximations, 260. 

Chew, V., review of multinormal intervals 
theory, 264. 

Circular, serial correlation, 362; processes, 
426; see Time-series. 

Classification, see Discrimination and classifica- 
tion. 
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AV, 31-4; two-way, (Example 35.5) 31; 
three-way, 34, (Exercise 35.3) 53; in 
Model II, 70-1; balanced two-way (Ex- 
ample 36.7) 70; balanced three-way, 
(Example 36.8) 70-1. 

Classification, mixed cross- and hierarchical, 
34-5. 

Classification, multi-way, in Model I AV, 
34-40, (Exercise 35.6) 53; in Model II, 
68-70, (Exercise 36.8) 83; permutation 
tests, 108. 

Classification, one-way, in Model I AV, 
(Example 35.1) 6, (Exercise 35.1) 52; 
balanced in Model II, (Examples 36.1-3, 
36.5) 58, 61, 64, 68, (Exercises 36.1, 
36.3, 36.5, 36.10-11, 36.13) 82-4; Model 
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II unbalanced, 72-4, (Exercises 36.12-13) 
84. 

Classification, ordered and metrical, 49, (Exer- 
cise 35.15) 55. 

Classification, three-way cross-, in Model I AV, 
35-9; balanced, (Example 35.6) 38, (Exer- 
cise 35.8) 54; disproportional frequencies, 
38; balanced, in Model II, 69-70; 
balanced in mixed model, 77; permutation 
tests, 108. 

Classification, two-way cross-, in Model I AV, 
10-31; proportional frequencies, 18, (Ex- 
ample 35.2) 19; equal frequencies 
(balanced), (Example 35.3) 23, (Exercise 
35.2) 52; disproportional frequencies, 26, 
(Example 35.4) 27, 28-30, (Exercises 
35.4-5, 35.7) 53, (Exercises 37.7-8) 115; 
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(Examples 36.4, 36.6) 65, 68, (Exercise 
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general model, 75-9; permutation tests, 
105-7; median tests, 109, (Exercises 37.18— 
20) 117-18. 

Classified data, AV for, 11; see Classification, 
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Clatworthy, W. H., BIB tables, 143-4; PBIB, 
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336-9. 

Cobb, W., mixed model for AV, 79. 

Cochran, W. G., combination of AV tests, 42; 
robustness of AV, 98; inference in sample 
surveys, 119; BIB plans, 143; BIB 
analyses, 146; lattice designs, 154; con- 
founding, 157; sequences of experiments, 
158; sample surveys, 166; formation of 
strata, 185-6, (Exercise 39.18) 208; system- 
atic sampling, 188; ratio and regression 
estimators, 223; two-phase sampling, 228; 
domains of study, 229; random formation 
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tion, 327, (Example 44.5) 328-9, 329, 
331, (Exercises 44.6-9) 340-1. 

Cochrane, D., autocorrelated errors in regres- 
sion, 497-8. 
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randomized blocks, 138. 

Combination of AV tests, 40-3. 

Complete randomization, 79. 

Complete sets of orthogonal Latin squares, 136. 
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67. 
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vectors, geometrical interpretation, 286-9; 
standardization, 289; testing latent roots, 
291—4; and index numbers, 295; meteoro- 
logical data, (Example 43.4) 295-300; see 
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Component analysis. 

Concomitant variables, 50, 211 f.n. 
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157. 
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Consul, P. C., distributions of LR criteria, 
271, 272, 280. 

Contrasts, 46; simultaneous confidence inter- 
vals for all, 46-9, (Exercises 35.11-14, 


35.16-19) 54-6. 

Cornfield, J., expected MS in AV, 69; models in 
AV, 75, 77. 

Correlation coefficient, variance-stabilizing 


transformation in normal samples, (Ex- 
ample 37.3) 93; canonical, 299-306; see 
Autocorrelation, Serial correlation. 

Correlation determinant, moments and asymp- 
totic distribution, (Example 41.4-5) 248- 
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Correlogram, 362, 404; and spectrum, 410; 
see "'ime-series. 

Cost function, in survey theory, 182-3; for 
stratified sample, 183; for multi-stage 
sample, 201-2; for selection probabilities, 
202-4; for two-phase sampling, 226-7; 
for non-response, (Exercise 40.7) 236. 

Covariance, analysis of, 50-52, 105, 211 f.n. 

Cowden, D. J., moving averages, 374. 

Cox, D. R., sequential AV, 79; transforma- 
tions, 85-8, (Exercise 37.1) 113; analysis 
of residuals, 97; experimental inference, 
123; expected MS in Latin squares, 138; 
serial correlation, 449; exponential weight- 
ing, 501. 

Cox, G. M., BIB plans, 143; BIB analyses, 146; 
lattice designs, 154; confounding, 157; 
sequences of experiments, 158. 

Craddock, J. M., meteorological data com- 
ponent analysis, (Example 43.4) 295-300. 

Cross-classification, see Classification, two-way 
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Daniels, H. E., approximations to serial 


correlation distributions, 446-9; spectrum 
analysis, 467. 

Darling, D. A., combination of AV tests, 42. 

Darroch, J. N., power of multivariate tests, 
271. 

Das Gupta, S., power functions of multivariate 
tests, 281. 

David, H. A., paired comparisons, 152; poly- 
nomial regression designs, 161. 

Davies, O. L., confounding, 157; evolutionary 
operation, 158. 

Davis, H. T., test in harmonic analysis, 462. 

Decomposition of non-central quadratic forms, 
2-5. 

Dempster, A. P., random allocation experi- 
ments, 130; high-dimensional two-sample 
test, 253; multivariate estimation, 264, 
291, 306. 

Design, problems, 119; equation, 141; see 
Experiments, Surveys. 

Difference-sign test, 355-7, 360, (Exercises 
45.2-3, 45.6) 363-4. 

Discrimination and classification, 314-41; 
linear discrimination, 314-22; geometrical 
interpretation, 317, 326; quadratic, 322; 
testing, 322-3; with cost function, 323-4; 
k populations, 324—6; qualitative data, 
326-7; reserved judgement, 327-8; biassed 
estimation of misclassification errors, 328— 
329; redundant variables, 329; standard 
errors of coefficients, 329-30; distribution- 
free methods, 323, 331-5; differences in 
dispersion, 335-6; classification, 336-9. 

Dispersion matrix, latent roots of, 255; 
LR test for, 272; estimated from residuals, 
274, 275-6; see Generalized variance 
(dispersion determinant), Latent roots. 

Distribution-free methods, in AV, 105-11; in 
testing multivariate location, 282; in dis- 
crimination, 323, (Exercises 44.10-11) 341. 

Dixon, W. J., serial correlations, 441-2, 
(Exercise 48.20) 453. 

Doksum, K., robust estimation in AV, 110; 
testing against ordered alternatives, 139. 

Dolby, J. L., transformations, 86. 

Domains of study, 229-34; across strata, 229-32, 
(Exercises 40.15-16) 238; in multi-stage 
sampling, 232-4. 
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Double sampling, see Two-phase sampling. 

Draper, N. R., rotatable designs, 158. 

Duncan, D. B., studentized range tests in AV, 
45-6. 

Dunn, O. J., confidence intervals for contrasts 
in AV, 49, (Exercise 35.14) 55; robustness 
of Т? test, 281. 

Durbin, J., ranks test for BIB, 151, (Exercise 
38.17) 164; estimation of variance in 
multi-stage sampling, 199, 201, 204, 
(Exercises 39.26-7) 208-9; selection with 
unequal probabilities, (Exercises 39.2, 
39.4) 205; reduction of bias in ratio 
estimator, 216, 217; asymptotic linearity 
of ratio and regression estimators, 223; 
domains of study, 229, (Exercises 40.15-16) 
238; non-response in surveys, (Exercise 
40.8) 236; spectrum and seasonal variation, 
467; moving average and autoregressive 
schemes, 481-5, (Exercise 50.6) 503; 
regression with autocorrelated errors, 
497-8; mixed autoregressive-regressive 

, 499; Yule-Walker equations, 
(Exercise 50.5) 502. 


Eisenhart, C., AV models, 57. 

Eisenpress, H., seasonal variation, 400. 

Ekman, G., formation of strata, 185. 

Elashoff, J. D., discrimination, 329. 

Elashof, R. M., missing observations, 113, 
(Exercise 37.21) 118; discrimination, 329. 

Empty cells in cross-classified data, 30. 

Equivalent samples, 170. 

Ergodic, 407-8, 410; see Time-series. 

Errors, unit, interactive and technical, 80. 

Euler, Latin squares, 134; false conjecture, 136. 

Evolutionary operation, 158. 

Experiments, design of, 119-65; compared with 
surveys, 119, 182; principles, randomiza- 
tion, 120-5; block experiments, 124-30; 
incidence matrix, 125; linear model, 
125-9; AV, 128-9; design, 129-30; with 
two nuisance factors, 132-4; interblock 
information, 146-51; preference, 151-2; 
factorial, 154—5; confounding, 155-8; se- 
quences of, 158; regression, 158-61; see 
Balanced incomplete blocks, Latin squares, 
Randomized blocks 


Factor analysis, 306-11 (Exercises 43.11-13) 
313; indeterminacy resolved, 307; ML 
solution, 308-9; tests for factors, 309-10; 
discussion, 310-11 ; see Componentanalysis. 
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Factorial experiments, 154—5; confounding, 
157; fractionally replicated, 157. 

Fellegi, I., sampling with unequal probabilities, 
171, 174. 

Filter, 423-4; see Time-series. 

Finch, P., autoregressive and moving average 
schemes, 484. 

Finite populations, 166; see Surveys. 
Finney, D. J., combination of AV tests, 42; 
probit and logit transformations, 95. 
Fisher, R. A. (Sir Ronald), originator of AV, 2; 
LSD test in AV, 44; transformation of r, 
92-3; test of symmetry, 106; advocacy of 
randomization, 120; experimental infer- 
ence, 123; BIB inequality, 142; confound- 
ing, 157; latent roots, 258; discrimination, 
(Example 44.1) 317-18, 339; test in 

harmonic analysis, 462. 

Fisk, P. R., stochastic equations, 500. 

Fortier, J. J., cluster analysis, 337. 

Foster, Е. G., distribution of latent roots, 259, 
(Example 42.4) 280-1; records test, 360, 
(Exercises 45.8-9) 364-5. 

Freeman, G. H. and Jeffers, J. N. R., non- 
orthogonal three-way cross-classification, 
38. 

Freeman, M. F., 
(Exercise 37.4) 114. 

Friedman, H. P., clustering procedures, 337. 

Friedman, M., AV using ranks, (Exercise 
37.14) 116. 


transformations, 90, 


Gabriel, К. R., AV of cell means, 38; simul- 
taneous and step-by-step tests in AV, 45, 
48-9, (Exercises 35.12, 35.17-18) 55-6. 

Gamma distribution, logarithmic transforma- 
tion, (Example 37.2) 91-2. 

Gardiner, D. A., rotatable designs, 158. 

Gassner, B. J., BIB designs, (Exercise 38.20) 
165. 

Gautschi, W., completeness, 62, 67. 

Gaylor, D. W., polynomial regression designs, 
161. 

Geary, R. C., distribution of ratio, 446. 

Geisser, S., multivariate normal theory, (Exer- 
cise 41.5) 260-1. 

General mean, 12. 

Generalized variance (dispersion determinant), 
distribution and moments, (Example 41.3) 
246-8, (Example 41.5) 249-50, (Exercises 
41.8-9) 261; estimation of, 264. 

Ghosh, B. K., sequential AV, 79. 

Ghosh, M. N., Tukey’s test for additivity, 25. 
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Ghosh, S. P., formation of strata, 186; two- 
phase sampling for clustering, 228. 
Girshick, M. A., latent roots, 258, 293-4, 
(Exercise 43.9) 312; canonical correla- 
tion, (Exercise 43.8) 312, (Exercise 43.10) 
313. 

Gleissberg, W., test of randomness, 354. 

Gleser, L. J., sphericity test, 272. 

Godambe, V. P., linear estimation in sample 
surveys, 174. 

Goldman, G. E., discrimination, 329. 

Good, I. J., circulant matrices, 394, (Exercise 
46.13) 402. 

Goodman, L. A., unbiassed ratio estimators. 
214, 215, 217. 
Graeco-Latin squares, 135-6, (Exercise 38.6) 
162; factorial experiments in, 155. 
Grandage, A. H. E., rotatable designs, 158. 
Granger, C., spectrum analysis, 468, 469, 492. 
Graybill, F. A., decomposition of quadratic 
forms in normal variables, 4; Model II 
AV, 59, 62, 63; inter-block information, 
150. 

Greenberg, V. L., robust estimation in AV, 
111. 

Grenander, U., spectrum analysis, 468; re- 
gression with autocorrelated errors, 498. 

Grundy, P. M., sampling with unequal 
probabilities, 173, (Exercise 39.11) 206-7. 

Guérin, R., review of BIB and PBIB designs, 
142-3, 153. 

Guest, P. G., polynomial regression designs, 
161. 


Haavelmo, 'T., systems of equations, 499. 

Hader, R. J., rotatable designs, 158. 

Hájek, J., limiting normality, 169; rejective 
sampling, 174. 

Hanani, H., BIB inequality, 142. 

Hannan, E. J., spectrum and seasonal varia- 
tion, 467; regression with autocorrelated 
errors, 497. 

Hansen, M. H., sample surveys, 166; choice of 
selection probabilities, 202; non-response 
in sample surveys, (Exercise 40.7) 236. 

Hanurav, T. V., sampling with unequal prob- 
abilities, 174. 

Harman, H. H., factor analysis, 310. 

Harmonic analysis, see Spectrum. 

Harter, H. L., multiple comparisons methods, 
46. 

Hartley, H. O., combination of AV tests, 
40-3; Newman-Keuls test, 46, (Exercise 
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35.10) 54; pooling in Model II AV, 
(Example 36.6) 68; ML in mixed AV 
model, 79; unbiassed ratio estimators, 
212, 214, 215, 217; domains of study, 229; 
random formation of strata, (Exercise 40.6) 
236. 

НагуШе, D. A., estimation in unbalanced 
Model II AV, 74, 81. 

Hatanaka, М., spectrum analysis, 
492. 

Healy, M. J. R., transformation tables, 87; 
data on male premolars, (Example 42.4) 
280-1. 

Henderson, C. R., estimation in unbalanced 
Model II AV, 73. 

Herbach, L. H., testing hypotheses in Model 
II AV, 68, (Exercises 36.5-6) 83. 

Hess, I., formation of strata, 186; ratio estima- 
tion, 223. 

Hext, G., spectrum analysis, 467. 

Hicrarchical classification, see Classification, 
hierarchical. 

Higham, J. A., trend fitting, (Exercise 46.6) 
401. 

Hills, M., discrimination, 327, 329. 

Hodges, J. L., “aligned” ranks test, 108; 
robust estimators in AV, 110; formation 
of strata, 185. 

Hoel, P. G., regression designs, 161; distribu- 
tion of dispersion determinant, (Exercise 
41.8) 261. 

Hogg, R. V., nested hypotheses, (Exercise 
37.2) 114; testing degree of polynomial 
regression, (Exercise 37.3) 114; power of 
tests, 281. 

Hollander, M., testing against ordered alterna- 
tives, 139. 

Holloway, L. N., robustness of Т? test, 281, 

Holt, C. C., exponential weighting, 501. 

Homogeneity, LR tests of, 87, (Exercises 
37.1-3) 113-14, 264-9, (Example 42.1) 
270, (Example 42.2) 272-3, (Exercises 
42.1-3) 282. 

Hopkins, C. E., discrimination, 327, (Example 
44.5) 328-9. 

Horsnell, G., robustness of AV, 98. 

Horvitz, D. G., sampling with unequal 
probabilities, 173. 

Hotelling, H., T? as AV test, 79; variance- 
stabilization, 92; Т? distribution, 250, 
253, (Exercise 41.12) 262; geometrical 
interpretation, 251-2; T? and R*, 252; 
T" and F, 252; one-sample T°? test, 252; 


468, 
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two-sample T? test, 253; two-sample Т? 
and D*, 260; non-central T*, 259, 281; 
To? test for several samples, 281-2; 
robustness and power of T?, 281-2, 
(Exercises 42.11, 42.13-14) 283; Т? as LR 
test, (Exercise 42.10) 283; T? in generali- 
zation of Scheffé's test, (Exercise 42.12) 
283; canonical correlations, 299, (Example 
43.5) 303-4, 305, (Exercises 43.5-7) 
311-12. 

Howe, W. G., factor analysis, 309. 

Hoyland, A., robust estimators in AV, 110. 

Hsu, P. L., latent roots, 258; multivariate Beta 
distribution, 260. 

Hultquist, R. A., Model II AV, 59, 62, 63. 

Hunter, J. S., response surfaces, 158. 

Hurwitz, W. N., sample surveys, 166; choice 
of selection probabilities, 202; поп- 
response in sample surveys, (Exercise 
40.7) 236. 


Imhof, J. P., mixed models in AV, 77, 79. 

Incidence matrix, of an experiment, 125. 

Independence, LR test of, (Exercises 41.10-11) 
261-2, 270-1, 281, (Exercise 42.4) 282; 
power of various tests, 282; equivalent to 
testing equality of latent roots, 291. 

Index number, from component analysis, 295. 

Intensity, 411; see Spectrum, Time-series. 

Interactions in AV, 13, 36; Tukey’s test for, 
(Example 35.3) 23; zero for any weights if 
for one set, 26; independent and tied, 
75-6; unit-treatment, 80. 

Interactive errors, 80. 

Inter-block information, 146-51, (Exercises 
38.14-16) 164. 

Inverse sampling, (Exercise 40.3) 235. 

Ito, K., robustness of T; test, 281. 


James, A. T., latent roots, 260. 

Jayachandran, K., powers of tests for mean- 
vectors, and tests of independence, 282. 

Jeffers, J. N. R., see Freeman, G. H. 

Jenkins, G. M., joint distribution of serial 
correlations, 437, 449, (Exercises 48.18— 
48.19) 453; non-null distribution of serial 
correlations in Markoff case, 444, 445, 
(Exercise 48.17) 452; spectrum analysis, 
466; exponential weighting, 501. 

John, S., discrimination, (Example 44.5) 329, 
(Exercise 44.2) 340. 

Johnson, A. H. L., square root transformations, 
(Exercise 37.15) 117. 
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Johnson, N. L., sequential AV, 79; quota 
sampling, (Exercise 40.11) 237-8; variate- 
difference method, 391. 


Karlin, S., optimal experiments, 130. 

Kelley, T. L., psychological data for canonical 
analysis, (Example 43.5) 303. 

Kempthorne, O., models in AV, 75; complete 
randomization, 81; expected mean squares 
in randomized blocks and Latin squares, 
138; BIB analyses, 146; PBIB analysis, 
153; lattice designs, 154; confounding, 
157; sequences of experiments, 158. 

Kendall, D. G., logarithmic transformation, 
92. 

Kendall, M. G., AV using ranks, (Exercise 
37.14) 116; n-dimensional geometry, 243 
f.n.; computation in component ana- 
lysis, 289; ranking for principal com- 
ponents, 295; discarding variables, 295; 
factor analysis, 310; classification, (Ex- 
ample 44.7) 338-9; central limit for moving 
average weights, 370 f.n.; bias in serial 
correlations, 435, (Exercises 48.4, 48.11) 
450-1; distribution of serial correlation in 
Markoff case, 444. 

Keuls, M., studentized range test in AV, 
45-6, (Exercise 35.10) 54. 

Khamis, S. H., sample survey theory, 170, 
(Exercise 39.1) 204. 

Khan, S., stratified sampling, 183. 

Khintchin, A., ergodic theorem, 407, 410. 

Kiefer, J., optimal experiments, 130; regression 
designs, 158. 

King, B., clustering procedures, 337. 

Kish, L., estimation of variance, (Exercise 
39.16) 207; ratio estimation, 223. 

Kokan, A. R., stratified sampling, 183. 

Koop, J. C., linear estimation in sample 
surveys, 174. 

Koopmans, T. C., serial correlations, 442. 

Korin, B. P., LR test for dispersion matrix, 272. 

Kruskal, J. B., monotone transformations, 88. 

Kshirsagar, A. M., multivariate Beta distri- 
bution, 260; Bartlett decomposition, (Exer- 
cises 41.16-17) 262-3. 


Lahiri, D. B., selection with unequal proba- 
bilities, (Exercise 39.11) 206-7; removal 
of bias in ratio estimators, 223. 

Latent roots of a dispersion matrix, null 
distribution, 255-8, 259, 260, (Exercise 
41.14) 262; (Example 42.4) 280-1; testing 
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equality equivalent to testing independ- 
ence, 291, (Example 43.3) 292; testing 
zero value, 292; testing equality of small 
roots, 292-3; large-sample results, 293-4; 
in discrimination, 326; see Canonical 
correlations, Component analysis, Factor 
analysis. 

Latin squares, 134-7, (Exercises 38.5-7) 162; 
robustness of normal theory, 138-9; 
factorial experiments in, 155. 

Lattice designs, 153-4. 

Laubscher, N. F., transformations, (Exercises 
37.4-5) 114-15. 

Lawley, D. N., distribution of LR statistic, 269; 
Tw test, 281; component analysis, (Ex- 
ample 43.2) 290-1, 293; canonical correla- 
tions, 305, 306; factor analysis, 308, 310, 
(Exercise 43.11) 313. 

Least Significant Difference (LSD) test, in AV, 
43-4. 

Least squares, in sampling without replacement, 
167-8; in discrimination, 323, (Exercises 
44.10-11) 341; for moving averages, 
366-7, 374—5; in autoregressive series, 
476, 499; see Linear model. 

Ledermann, W., component analysis, 285. 

Lehmann, E. L., “ aligned ” ranks test, 108; 
robust estimators in AV, 110; multi- 
variate tests, 281. 

Leipnik, R. B., serial correlation, 444, 445. 

Levene, H., tests of randomness, 354, 355, 
(Exercises 45.6-7) 364. 

Levine, A., polynomial regression designs, 161. 

Lewis, T., canonical analysis in educational 
research, 305. 

Lewyckyj, R. J., BIB tables, 143-4. 

Likelihood Ratio (LR) tests, in Model II AV, 
(Exercises 36.5-6) 82-3; for “nested” 
hypotheses, 87, (Exercises 37.1-3) 113-14; 
in multivariate analysis, 265-84; see 
Dispersion matrix, Homogeneity, In- 
dependence, Regression, Sphericity. 

Linear model, AV in (Model I), 1-56; decom- 
position of non-central quadratic forms, 
2-5; removal of singularity, 12; choice of 
weights, 25-6; general disproportional 
frequencies case, 38; combination of 
tests, 40-3; multiple comparisons, 43-9; 
analysis of covariance, 49-52; extension 
of model to further parameters, 51-2; 
transformations to, 85-8; missing obser- 
vations, 111-13, (Exercise 37.21) 118; for 
block experiments, 125-9; in sampling 
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without replacement, 167-8; multivariate, 
273-6, (Example 42.3) 277-80; see Analysis 
of Variance, Classification, one-way, etc. 

Linhart, H., discrimination, 327. 

Lipton, S., data on male premolars, (Example 
42.4) 280-1. 

Logarithmic transformations, (Examples 37.2, 
37.4) 91, 93; 95-6. 

Logit transformations, 94. 

Lomnicki, Z. A., test for normality of a 
stationary process, (Exercise 49.8) 470. 

Low, L. Y., estimation in unbalanced Model 
II AV, 73. 

LR, see Likelihood Ratio, 

LSD, see Least Significant Difference. 

Lubischew, A. A., discrimination, 336. 


McDonald, B. J., rank sum multiple com- 
parisons, 46. 

Madow, W. G., sample surveys, 166; limiting 
normality, 169; serial correlation, 443. 

Mahalanobis, P. C., D? statistic and generalized 
distance, 259-60. 

Main effects in AV, 12, 36. 

Manley, G., meteorological data, (Example 
43.4) 295. 

Mann, D. W., discarding variables, 295. 

Mann, H. B., Latin Squares, 137; construction 
of BIB, 143; confounding, 157; difference- 
sign test, 357, (Exercise 45.3) 363; rank 
correlation test, 358; LS in autoregressive 
series, 476. 

Markoff series, autocorrelations, (Example 47.2) 
405; backwards, (Example 47.3) 406; 
correlogram and spectrum, (Example 47.7) 
418-19; partial autocorrelations, 424—5; 
cumulants and normality, (Exercise 47.2) 
426; (Exercise 47.3) 427; grouping, 
(Exercise 47.15) 428; standard error of 
serial correlations, (Example 48.3) 432; 
covariance of serial correlations, (Example 
48.4) 433; bias in serial correlation, 
(Example 48.7) 435, (Exercises 48.4-5; 
48.8, 48.11) 450-1; to higher order, 435; 
non-null distribution of serial correla- 
tions, 443-4, 447-9, (Exercises 48.13,48.17) 
451-2; effect of starting-point, 472-4; 
multivariate, 496; in forecasting, 501. 

Marriott, F. H. C., bias in serial correlations, 
435. 

Marsaglia, G., decomposition of quadratic 
forms in normal variables, 4, 

Mauchly, J. W., sphericity test, 271. 
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Mauldon, J. G., multivariate estimation para- 
doxes, 281. 

Maxwell, A. E., component analysis, (Example 
43.2) 290-1; factor analysis, 308, 310, 
(Exercise 43.11) 313. 

Mean, moments of, 168-9. 

Mean squares (MS), expected values of in 
Model II AV, 63, 69. 

Median tests in AV, 108-11, 
37.18-20) 117-18. 

Mehra, K. L., “aligned” ranks test, 108. 

Mickey, M. R., unbiassed regression-type 
estimators, 219. 

Mill, J. S., on experiments, 120. 

Minimum variance (MV) allocation, in stratified 
sampling, 180, 183. 

Missing observations, 111-13, (Exercise 37.21) 
118. 

Mixed models, 77-9; for recovery of inter- 
block information, 146-51. 

Models I, II; see Analysis of variance. 
Mood, A. M., median tests, 109-11, (Exercises 
37.18-20) 117-18; latent roots, 258. 
Moore, G. H., tests of randomness, 354, 356. 
Moran, P. A. P., Slutzky sinusoidal limit, 415; 
moments of serial correlations, 435-7, 
449, (Exercises 48.5-6, 48.12, 48.21) 450— 

453. 

Mostafa, M. G., tests in unbalanced Model II 
AV, 74. 

Mosteller, F., tables of transformations, 90, 
(Exercise 37.4) 114. 

Moving average, 367-402; as LS polynomial, 
366-7; formulae to degree 5, 368-9; 
formulae in terms of differences, 370; 
Spencer’s 15- and 21-point formulae, 
(Examples 46.3-4) 372; end-effects, 373-4; 
using orthogonal polynomials, 374-5; see 
Moving average series, Seasonal variation, 
Trend. 

Moving average series, 412-16; and auto- 
regressive series, 417, 474-6; estimates 
and tests of fit, 481-4, (Exercise 50.6) 
503; as errors in autoregressive series, 
484-6; exponentially weighted, 501; see 
Autoregressive series, Time-series. 

MS, see Mean squares. 

Mudholkar, G. S., multivariate estimator in 
surveys, 216; power functions of multi- 
variate tests, 281. 

Muller, E.-R., BIB designs, 143. 

Multinormal distribution, multivariate normal 
distribution; see Multivariate analysis. 


(Exercises 
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Multi-phase sampling, 228. 

Multiple comparisons in AV, 43-9. 

Multi-stage sampling, 189-204, (Exercises 39.24, 
39.28-9) 208-9; estimator, 191; with 
equal probabilities, 191-4; with unequal 
probabilities, 195-7; estimation of vari- 
ance, 197-201, 223-4; cost function and 
minimum variance, 201-2; choice of 
probabilities, 202—4; efficiency, 204; strati- 
fication, 204; ratio and regression esti- 
mators, 223-4; domains of study, 232-4. 

Multivariate analysis, 239-341; in time-series, 
486-96; see Canonical variables, Com- 
ponent analysis, Correlation determinant, 
Discrimination and classification, Dis- 
persion matrix, Factor analysis, Gener- 
alized variance, Homogeneity, Hotelling 
T?, Independence, Latent roots, Regres- 
sion, Sphericity, Wishart distribution. 

Murteira, B., variate-difference method, (Exer- 
cises 48.15-16) 452. 

Murthy, M. N., sample survey theory, 166; 
sufficiency in surveys, 176, (Exercises 
39.30-1) 209-10 ; unbiassed ratioestimators, 
223, (Exercise 40.1) 234. 

Murty, V. N., inequality for BIB, (Exercise 
38.13) 163. 

MV, see Minimum variance. 


Nair, K. R., PBIB, 153. 

Nanjamma, N. S., unbiassed ratio estimators, 
223, (Exercise 40.1) 234. 

Narain, R. D., tests of independence unbiassed, 
271. 

Negative binomial distribution, angular trans- 
formation, 95, (Exercise 37.5) 114. 

Nerlove, M., spectrum analysis, 467. 

Nested classification, 31 f.n.; see Classifica- 
tion, hierarchical. 

Nested hypotheses, 87, (Exercises 37.1-3) 
113-14. 

Newman, D., studentized range test in AV, 
45-6, (Exercise 35.10) 54. 

Neyman, J., transformation bias, 95; stratified 
sampling, 180; two-phase sampling, 224. 

Nieto de Pascual, J., unbiassed ratio estimators, 
213, 215, 217, 223. 

Noether, G. E., ranks test for BIB, 151; rank 
serial correlation test, 360. 

Non-central quadratic forms, decomposition 
of, 2-5. 

Normal distribution, logarithmic transforma- 
tion of sample variance, (Examples 37.2, 
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37.4) 91, 93; square root transformation of 
sample variance, (Example 37.5) 94; see also 
Bivariate normal, Multivariate analysis. 
Normal scores, transformation to, 94; AV 
using, 105, 107-8. 
Normalizing transformations, 93-4. 
Norton, H. W., review of Latin squares, 137. 
Nuisance factors, 124; two, 132; three or more, 
135-7. 


Ogawa, J., robustness of F-test in randomized 
blocks and BIB, 139, 151. 

Olkin, I., multivariate ratio estimators, 216, 223. 

One-way classification, see Classification, one- 
way. 

Orcutt, G. H., autocorrelated errors in re- 
gression, 497-8. 

Orthogonal squares, 136, (Exercises 38.6-7) 
162; factorial experiments in, 155. 


Paired comparisons, 152. 

Parker, R., Euler's false conjecture, 136. 

Partially balanced incomplete blocks (PBIB), 
152-3. 

Parzen, E., spectrum analysis, 
(Exercises 49.6—7) 470. 

Pathak, P. K., sufficiency in sample survey 
theory, 170-1, (Exercise 39.30) 209, 223, 
(Exercise 40.3) 235. 
Patterson, H. D., sampling on successive 
occasions, (Exercises 40.9-10) 236-7. 
PBIB, see Partially balanced incomplete blocks. 
Pearce, S. C., review of non-orthogonal AV, 
38. 

Pearson, E. S., homogeneity tests, (Example 
42.2) 272. 

Periodogram, 411-12; see Spectrum, Time- 
series. 

Permutation tests, in AV, 105-8, 138-9, 151. 

Phases, in time-series, 353-5, (Exercise 45.1) 
363; in harmonic analysis, 454; see Two- 
phase sampling, Multi-phase sampling. 

Phillips, A. W., multivariate time-series, 489. 

Pillai, K. C. S., distribution of latent roots, 
259, 260; powers of tests for mean-vectors, 
and tests of independence, 282. 

Pitman, E. J. G., permutation test in AV, 
106-7, (Exercise 37.13) 116. 

Plackett, R. L., models in AV, 75, 81; dupli- 
cated observations in AV, 113. 

Please, N. W., discrimination, 336. 

Poisson distribution, square root transforma- 
tions, (Example 37.1) 89-90, (Exercises 


466, 467, 
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37.15-17) 117; sampling, (Exercises 39.13, 
39.23) 207-8. 

Polynomial testing degree іп regression, 
(Exercise 37.3) 114; regression designs, 
158-61. 

Pooling procedures, in AV, (Example 36.6) 
68; in regression, (Exercise 37.3) 114. 

Pope, J. A., bias in serial correlations, 435. 

Posten, H. O., power of L.R. test, 282. 

Preference experiments, 151-2. 

Principal components, 287; see Component 
analysis. 

Probabilities proportional to size (p.p.s.), 
195-7, 204; see Unequal probabilities, 
Surveys. 

Probit transformations, 94. 

Product estimator, (Exercise 40.2) 234-5. 

Puri, M. L., robust estimators in AV, 111; 

multivariate ranks test, 282. 


Quade, D., rank analysis of covariance, 105. 
Quadratic forms, decomposition of, 2-5. 
Quasi-random sampling, 188. 

Quenouille, M. H., sequences of experiments, 
158; method of bias-reduction, 216, 264, 
306, 435; variate-difference method, 393, 
(Exercises 46.7-11) 401; trend-fitting, 
393-4, 396, (Exercise 46.12) 402; large- 
sample theory of serial correlations for 
autoregressive series, 433, (Exercise 48.14) 
452; non-null distribution of serial correla- 
tion in Markoff case, 444, transformed, 
(Exercise 48.13) 451-2; joint distribution 
of serial correlations, 449; robustness of 
serial correlation theory, 449; unequal 
time-intervals in time-series, 469; partial 
autocorrelations and test of fit in auto- 
regressive series, 478; multivariate time- 
series, 486-9, 492; series with common 
errors, (Exercise 50.8) 503. 

Quota sampling, (Exercise 40.11) 237-8. 


Raj, D., sufficiency in surveys, 170, (Exercise 
39.1) 204; unequal probabilities, 176, 
(Exercises 39.9-10) 206; two-phase samp- 
ling for probabilities, 228. 

Rajalakshman, D. V., multivariate time-series, 
496. 

Randomization, complete, 79; in experiments, 
120-5. 

Randomized blocks, 79-80, 130-2, (Exercises 
38.34) 162; robustness of normal theory, 
138-9; factorial experiments in, 155. 
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Randomness, tests of, 360; see Difference- 
sign, Phases, Rank correlation, Records, 
Serial correlation, Turning-points. 

Range tests in AV, 44-6. 

Rank, transformations, 94; AV using, 105, 
107-9, (Exercises 37.13-14) 116; test in 
BIB, 151; correlation tests in time-series, 
357-60; serial correlation test, 360, (Exer- 
cise 45.5) 363. 

Rao, С. R., BIB designs, 143-4; BIB analyses, 
146; PBIB analysis, 153; discrimination, 
322, 324, (Example 44.4) 325-6. 

Rao, J. K., ML in mixed AV model, 79; 
sampling with and without replacement, 
171; unequal probabilities, 174; reduction 
of bias in ratio estimator, 216; random 
formation of strata, (Exercise 40.6) 236. 

Rao, P. S. R. S., multivariate estimator in 
surveys, 216. 

Ratio estimators, biassedness, 211-12, 216-18, 
222-3; consistency, 212; modified, 212-13, 
(Exercises 40.1—2) 234—5, (Exercises 40.13- 
40.14) 238; variance comparisons, 213-18; 
in stratified and multi-stage sampling, 
223-4; asymptotically linear, 223-4; in 
two-phase sampling, 227-8. 

Realization, 404. 

Recognizable individuals, in sample survey 
theory, 166, 170-1, 174—5. 

Records test, 360, (Exercises 45.8-9) 364—5. 

Recovery of inter-block information, see Inter- 
block information. 

Rees, D. H., non-orthogonal additive multi- 
way cross-classification, 38; distribution 
of latent roots, 259, (Example 42.4) 
280-1. 

Regression, testing degree of polynomial, 
(Exercise 37.3) 114; transformations, 
(Exercise 37.9) 115; designs, 158-61; in 
multivariate analysis, 273-6, (Example 
42.3) 277-80, (Exercises 42.15-16) 283-4; 
with autocorrelated errors, 497-8; see 
Autoregressive series. 

Regression estimators, 218-19; unbiassed, 
219-22; in stratified and multi-stage 
sampling, 223-4; asymptotically linear, 
223-4; in two-phase sampling, 227-8. 

Rejective sampling, 174. 

Replacement, sampling with and without, 166; 
see Surveys. 

Residuals, analysis of, 96-7; dispersion matrix 
of, 274, 275-6. 

Response surfaces, 158. 


553 


Rhodes, E. C., trend-fitting, 393. 

Robson, D. S., ratio estimators, 214, 223; 
product estimator, (Exercise 40.2) 235-6. 

Robustness, of AV procedures, 97-108, 110-11. 

Romanovsky, V., Slutzky sinusoidal limit, 
415. 

Rosenblatt, M., spectrum analysis, 468; re- 
gression with autocorrelated errors, 498. 

Ross, A., allocation in stratified sampling, 
(Exercise 39.20) 208; unbiassed ratio 
estimators, 212. 

Rotatable designs, 158. 

Roy, J., inter-block information, 150. 

Roy, S. N., mixed model in AV, 79; distri- 
bution of latent roots, 258, 259; Mahala- 
nobis's D? statistics, 259. 

Rubin, H., serial correlations, 442. 

Rubin, J., clustering procedures, 337. 


Sampford, M. R., selection with unequal 
probabilities, (Exercise 39.2) 205; inverse 
sampling, (Exercise 40.3) 235. 

Sample surveys, see Surveys. 

Sarangi, J., “ aligned” ranks test, 108. 

Satterthwaite, F. E., approximate F-test in 
AV, (Exercise 36.7) 83; random balance 
experiments, 130. 

Schatzoff, M., distributions of LR criteria, 268, 
280; comparison of tests for mean- 
vectors, 282. 

Scheffé, H., Tukey’s test for additivity, 25; 
interactions zero for any weights if for 
one set, 26; analysis of cross-classified 
data with empty cells, 30; three-way 
hierarchical classification, 34; multiple 
comparisons, 46; simultaneous confidence 
intervals for all contrasts, 48-9, (Exercises 
36.11-13) 54-5; analysis of covariance, 52; 
expected MS in AV, 69; Model II three- 
way hierarchical classification, 71; con- 
fidence intervals in Model II AV, 71, 
(Exercise 36.15) 84; models for AV, 75; 
mixed model, 75, 77, 79; robustness of 
AV, 98; AV of cell means, (Exercise 37.7) 
115; interaction in Latin squares, 135; 
robustness in randomized blocks and 
Latin squares, 138-9; problem of two 
means, 281. 

Schmetterer, L., transformation bias, 95. 

Schull, W. J., robustness of T? test, 281. 

Scott, E. L., transformation bias, 95. 

Searle, S. R., estimation in unbalanced Model 
П AV, 73. 
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Seasonal variation, 349-50, 396—400, 403; апа 
spectrum, 467-8; see Moving average, 
"Trend. 

Seber, G. A. F., orthogonality in AV, 37; 
power of multivariate tests, 281. 

Self-weighting designs, 195, 202. 

Sen, A. R., selection schemes with unequal 
probabilities, (Exercises 39.5-6) 205-6. 

Sen, P. K., robust estimation іп AV, 111; 
efficiency of permutation test in random- 
ized blocks, 151; multivariate ranks test, 
282. 

Sequential analysis of variance, 79. 

Serial correlation, using ranks, 360, (Exercise 
45.5) 363; generally, 361-2, (Exercises 
45.10-11) 365; and variances of differ- 
ences, 391-2; and  variate-difference 
method, 393, (Exercises 46.10-11) 401; 
large-sample theory, 431-3, (Exercises 
48.3, 48.9-10) 450-1; bias, 433-5; exact 
moments, 435-7, (Exercises 48.5-6, 48.12, 
48.18-19) 450-3; distribution in normal 
case, 437-49, (Exercise 48.20) 453; trans- 
formations, 445, (Exercise 48.13) 451-2; 
see Autocorrelation. 

Seshadri, V., inter-block information, 150. 

Sethi, V. K., formation of strata, 186; un- 
biassed ratio estimators, 223. 

Shah, К. R., inter-block information, 150-1. 

Sharma, D., Tukey’s test for additivity, 25. 

Shiskin, J., seasonal variation, 400. 

Shorack, G. R., ordered alternatives in AV, 49. 

Shrikande, S. S., Euler’s false conjecture, 136; 
PBIB, 153. 

Sillitto, С. P., BIB tables, 143-4. 

Silvey, S. D., power of tests, 281. 

Simaika, J. B., power of T* test, 281. 

Simultaneous test procedures, 44-9, (Exercises 
35.11-14, 35.16-17, 35.19) 54-6. 

Singh, M. P., two-phase sampling, 228. 

Siotani, M., confidence intervals for contrasts in 
AV, 49. 

Sitgreaves, R., discrimination, (Exercise 44.2) 
340. 

Slater, P., discrimination data on neurotics, 
(Example 44.4) 325-6. 

Slutzky-Yule effect, 378; sinusoidal limit 
theorem, (Example 47.6) 414-15. 

Smith, B. Babington, AV using ranks, (Exercise 
37.14) 116. 

Smith, C. A. B., quadratic discrimination, 322. 

Smith, K., polynomial regression designs, 161, 
(Exercise 38.19) 165. 
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Snedecor, G. W., Yates' method of weighted 
squares of means, 30. 

Solomon, H., cluster analysis, 337. 

Spectrum, spectral density, spectral function, 
410-11; as autocorrelation g.f., 410; of 
Markoff and Yule series, (Examples 47.7-8) 
418-20; for continuous series, 422; effect 
of filtering, 423-4; analysis, 454-71; 
harmonic analysis, 454-5; Nyquist fre- 
quency, aliases, 455—7; effect of a harmonic 
component, 458-60; effect of other perio- 
dicities, trend, 460-1; test for the spectral 
ordinate, 461-2; smoothing, 463-4; cal- 
culation of, 464—6; estimation of density, 
466-7; and seasonal variation, 467-8; 
unequal time-intervals, 468-9; cross- 
spectra, 491-6; coherence, 491; poly- 
spectra, 492; see Time-series. 

Spencer’s 15- and 21-point formulae, (Ex- 
amples 46.3-4) 372. 

Sphericity test, 271-2, 
292. 

Spjotvoll, E., unbalanced Model II AV, 73, 
(Exercise 36.10) 84. 

Split-plot designs, 157. 

Sprott, D. A., BIB designs, 143. 

Square root transformations, (Example 37.1) 
89-90, 95, (Exercises 37.15-17) 117. 
Srivastava, S. R., pooling procedures in 
Model II AV, (Example 36.6) 68. 

Stabilization of variance, 88-92. 

Stages, see Multi-stage sampling. 

Stationary time-series, 404; see 'Time-series. 

Stein, C., recovery of inter-block information, 
151. 

Step-by-step AV test procedures, 42-6, (Exer- 
cise 35.18) 56. 

Stevens, W. L., non-orthogonal three-way 
cross-classification, 38. 

Stratified sampling, 177-87, (Exercises 39.13— 
39.15, 39.17-21) 207-8; motivation for, 
177-9; choice of sample sizes, 180-2; 
strata and blocks, 182; MV allocation for 
fixed cost, 183; formation of strata, 
183-6, (Exercises 40.4—6) 235-6; estima- 
tion of effect, 186—7; and clustering, 188-9; 
in multi-stage sampling, 204; ratio and 
regression estimators, 223-4; with two 
phases, 224-6; domains of study, 232-4, 
(Exercises 40.15—16) 238; quota sampling, 
(Exercise 40.11) 237-8. 

Stuart, A., random formation of strata, 
(Exercises 40.4—6) 236; difference-sign 


(Example 43.3) 
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test, 357, 360; records test, 360, (Exercises 
45.8-9) 364-5; turning-points test, 360, 
(Exercise 45.4) 363; rank correlation tests, 
360; rank serial correlation tests, 360, 
(Exercise 45.5) 363. 

Studden, W. J., optimal experiments, 130. 

** Student,” (W. S. Gosset), LSD test in AV, 
43. 

Studentized range tests in AV, 44-6. 

Successive occasions, sampling on, (Exercises 
40.9-10) 236-7. 

Sufficiency, in Model 11 AV, 62, 73, (Exercise 
36.12) 83; in sample survey theory, 
170-1, 176, (Exercise 39.1) 204, (Exercises 
39.30-1) 209-10. 

Sugiyama, Т., distributions of latent roots, 259 

Supplementary information, 211-38; see Ratio 
estimators, Regression estimators, Two- 
phase sampling. 

Surveys, compared with experiments, 119, 
182; theory, 166-238; random sampling 
without replacement, 167-8; moments of 
sample mean, 168-9; sufficiency, 170-1; 
see Domains of study, Multi-stage samp- 
ling, Ratio estimators, Regression esti- 
mators, Stratified sampling, 'Two-phase 
sampling, Unequal probabilities. 

Sweeny, H. C., polynomial regression designs, 
161. 

Systematic sampling, 187-8. 


T*, T$, see Hotelling. 

Takeuchi, K., BIB designs, 143-4. 

Tamura, R., multivariate distribution-free 
location tests, 282. 

Taylor, L. R., transformation tables, 87. 

"Technical errors, 80. 

'Thompson, D. J., sampling with unequal 
probabilities, 173. 

Thompson, W. A., Jr, rank sum multiple 
comparisons, 40; negative estimates of 
variance in Model II AV, 71. 

Tidwell, P. W., transformations, 86, (Exer- 
cise 37.9) 115. 

Tied interactions, 75-6. 

'Time-series, 342-503; general, 342-8; com- 
ponents of, 349-50, 366; tests of random- 
ness, 350-61; moving averages, 367-402; 
stationary, 403-4; ergodic, 407-8, 410; 
intensity, 411; periodogram, 411-12; mov- 
ing average series, 412-16; autoregressive 
series, 416-21; continuous series, 421-3; 
filters and transfer functions, 423-4; in- 
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finite and circular processes, 425-6; serial 
correlations, 360-2, 431-53; spectrum 
analysis, 410-11, 454—71; estimation and 
testing in autoregressive and moving aver- 
age series, 472-86, 497-500; multivariate, 
486-96; systems of equations, 496—7; fore- 
casting, 500-2; see Autocorrelation, Auto- 
regressive, Markoff, Moving average series, 
Randomness, tests of, Serial correlation, 
Spectrum, Trend, Variate-difference, Yule 
series. 

Tin, M., ratio estimators, 217-18, (Exercises 
40.13-14) 238. 

Tintner, G., variate-difference method, 390, 
391. 

Tocher, К. D., missing observations, 112; 
other spoilt experiments, 113; block ex- 
periments, 124, 140, (Exercises 38.2-3, 
38.8-9) 162-3; inter-block information, 
150, (Exercise 38.15) 164. 

Transfer function, 423-4; see Time-series. 

‘Transformations, to the normal linear model, 
85-8; purposes of, 87-8; monotone, 88; 
variance-stabilizing, 88-92; normalizing, 
93-4; to additivity, 94-5; removal of bias, 
95-6; analysis of residuals, 96-7; see also 
Angular, Logarithmic, and Square Root 
transformations. 

‘Treatments, 124; AV for, 155. 

Trend, 349-50, 366; tests against, 355, 360; 
effect of elimination by moving averages 
(Slutzky-Yule effect) 375-84, 393-6, 
(Exercise 45.12) 402; see Moving average. 

‘Tryon, К. C., cluster analysis, 337. 

Tschuprow, A. A., stratified sampling, 180. 

‘Tukey, J. W., test for additivity, 25; multiple 
comparisons, 43-9; studentized range 
tests, 44, 45; simultaneous confidence 
intervals for all differences, contrasts, 
combinations, 46-7, (Exercise 35.13) 55; 
expected MS in AV, 69; estimation in 
unbalanced Model II AV, 73; models in 
AV, 75, 77; moments of variance esti- 
mators in AV, (Exercise 36.10) 83; trans- 
formations, 90, (Exercise 37.4) 114; 
analysis of residuals, 96; spectrum analysis, 
466, 467, 468, (Exercise 49.5) 469. 

Turning-points test, 351-2, 353, 354, (Exercise 
45.4) 363. 

'Two-phase sampling, 224-8; for stratification, 
224-6; cost function, 226-7; for ratio 
estimation 227-8; for regression estima- 
tion, 228. 
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Unequal probabilities, in sampling without 
replacement, 171-6, (Exercises 39.2-11) 
205-7, (Exercises 39.30-1) 209-10; linear 
estimation, 173-5; with replacement, 176— 
177; and stratification, 177-9; and cluster- 
ing, 187-9; and multi-stage sampling, 189— 
190, 195-204; p.p.s. sampling, 195—7; esti- 
mation of sampling variance, 197-201; 
chosen to minimize variance, 202-4; 
chosen to remove bias, 223—4; two-phase 
sampling to determine, 228; see Multi-stage 
sampling, Stratified sampling, Surveys. 

Uniform sampling fraction (USF), 180. 

Unit errors, 80. 

USF, uniform sampling fraction, 180. 


Vajda, S., Latin squares, 137; BIB designs, 143; 
PBIB designs, 153. 

Van Elteren, P., ranks test for BIB, 151. 

Variance, see Analysis of variance. 

Variance-stabilizing transformations, 88-92. 

Variate-difference method, 384-93, (Exercises 
46.7-11) 401, (Exercises 48.15-16) 452. 

Verhagen, A. M. W., proof of Scheffé’s all- 
contrasts method, (Exercise 35.19) 56. 

Vithayasai, C., ratio estimators, 223. 

Vos, J. W. E., sampling in time and space, 
(Exercise 40.10) 237. 


Wagle, B., latent roots distributions, 259. 

Wald, A., confidence intervals in unbalanced 
Model 11 AV, 73, (Exercise 36.16) 84; 
discrimination, 339, (Exercise 44.2) 340; 
rank serial correlation test, 360; LS in 
autoregressive series, 476. 

Walker, A. M., autoregressive and moving 
average schemes, 484. 
Walker, G., equations for autoregressive series, 
417; test in harmonic analysis, 461. 
Wallis, W. A., tests of randomness, 354, 
356. 

Wang, Y. Y., estimation in Model II AV, 65. 

Ward, D. H., exponential weighting, 501. 

Watson, G. S., robustness of AV, 98, (Exer- 
cises 37.10-12) 115-16; joint distribution 
of serial correlations, 449; regression with 
autocorrelated errors, 497. 

Webster, J. T., reduction of bias in ratio 
estimator, 216. 

Weeks, D. L., inter-block information, 150. 

Weights, choice of, in AV for linear model, 
25-6. 

Welch, B. L., robustness of AV, 103, 139. 
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White, J. S., bias in serial correlations, 435; 
moments of serial correlation in Markoff 
case, 445. 

Whittaker, Е. T., periodogram, (Exercise 49,10) 
470-1. 

Whittle, P., autoregressive series, (Exercise 
47.16) 428; autocorrelation matrix, (Exer- 
cise 47.18) 429; estimating spectral density, 
467; moving-average series estimator, 
(Example 50.1) 475-6; estimation and 
testing in time-series, 486. 

Wiener, N., autocorrelation function, 422. 

Wijsman, R., Bartlett decomposition, (Exer- 
cises 41.17-18) 263. 

Wilk, M. B., models in AV, 75; complete 
randomization, 81; expected MS in Latin 
squares, 138. 

Wilkinson, G. N., missing observations, 112-13. 

Wilks, S. S., LR test of independence of sets 
of variates (Exercises 41.10-11) 261-2, 
271; homogeneity tests, (Example 42.2) 
272, (Exercise 42.7) 282. 

Williams, E. J., canonical 
discrimination, 326. 

Williams, W. H., unbiassed 
estimators, 219, 223. 

Wilson, K. B., evolutionary operation, 158. 

Winters, P. R., exponential weighting, 501. 

Wise, J., autoregressive series, 417, (Exercise 
47.17) 429. 

Wishart, J., distribution of multinormal covari- 
ances, 241—6, (Exercises 41.6-7) 261; non- 
central, 259; correlation between normal 
covariances, (Exercise 41.4) 260; distribu- 
tion of covariance, (Exercise 41.13) 262; 
form of distribution of residual dispersion 
matrix, 275-6. 

Wold, H., moving average series, 415, 484; 
autoregressive series, 418, (Exercise 47.6) 
427; causal models, 499, 500. 

Wolfowitz, J., regression designs, 161; phases 
test, 354; rank serial correlation test, 360. 

Working, H., grouping in Markoff series, 
(Exercise 47.15) 428. 


analysis, 306; 


regression-type 


Yates, F., method of weighted squares of means 
for two-way classification, 301, (Exercises 
35.57) 53; missing observations, 111, 113; 
BIB designs, 142; inter-block information, 
148, 150; lattice designs, 153; confounding, 
157; surveys, 166; sampling with unequal 
probabilities 173; systematic sampling, 
188; estimation of variance in multi-stage 
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sampling, 199, 201, 204; efficiency of Yule series, correlogram and spectrum, 


multi-stage sampling, 204; two-phase (Example 47.8) 419-20; limiting case, 
sampling, 228; domains of study, 229; (Example 47.10) 420-1; continuous ana- 
sampling on successive occasions, (Exer- logue, (Example 47.11) 423; partial auto- 
cises 40.9-10) 256-7. correlations, 425; variance, (Exercise 47.8) 
Youden, W. J., random balance experiments, 427; autocorrelations of residuals, (Exer- 
130; squares, 151-2. cise 47.10) 427; standard error of serial 
Young, D. H., quota sampling, (Exercise correlation, (Exercises 48.2, 48.14) 450, 
40.11) 238. 452; serial correlations with errors of 
Youtz, C., tables of transformations, 90, observation, (Exercise 49.11) 471; testof fit 
(Exercise 37.4) 114. (Example 50.3) 480-1; multivariate, 496. 


Yule, G. U., Slutzky-Yule effect, 378; equations 


for autoregressive series, 416; see Yule 
series. Zaycoff, R., variate-difference method, 390. 


